I made a rule that works rather well. But with a few exceptions, I think cause by the tokenizer.
The rule: (Intendend to catch all sentences containing just 1 word, te be expanded to catch 2, 3 etc.), to start capturing unusual sentence constructions structurally) <category name="Experimenteel, vangen van zinnen"> <rule id="VANG_1" name="Vang 1 woord"> <pattern> <token postag="SENT_START"/> <token><exception regexp="yes">[0-9]{1,2}|[a-z]</exception></token> <token postag="SENT_END" regexp="yes">\!|\?|\.</token> </pattern> <message></message> <example type="incorrect"><marker>Hij.</marker></example> <example type="correct">Hij kan het</example> </rule> </category> This rule catches: 1°. -3. Is there an explanation for the rule catching 2 characters in the token, like -3 and 1° ? Ruud ------------------------------------------------------------------------------ Slashdot TV. Video for Nerds. Stuff that matters. http://tv.slashdot.org/ _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel