I made a rule that works rather well. But with a few exceptions, I think
cause by the tokenizer.

The rule:
(Intendend to catch all sentences containing just 1 word, te be expanded
to catch 2, 3 etc.), to start capturing unusual sentence constructions
structurally)

<category name="Experimenteel, vangen van zinnen">
  <rule id="VANG_1" name="Vang 1 woord">
    <pattern>
      <token postag="SENT_START"/>
      <token><exception regexp="yes">[0-9]{1,2}|[a-z]</exception></token>
      <token postag="SENT_END" regexp="yes">\!|\?|\.</token>
    </pattern>
    <message></message>
    <example type="incorrect"><marker>Hij.</marker></example>
  <example type="correct">Hij kan het</example>
  </rule>
</category>

This rule catches:

1°.
-3.

Is there an explanation for the rule catching 2 characters in the token,
like -3 and 1° ?

Ruud


------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to