Hi Consider this very simple rule in the English grammar.xml:
<rule id="EGG_YOKE" name="egg yoke (egg yolk)"> <pattern> <token>egg</token> <token>yoke</token> </pattern> The rule works fine of the 2 words are separated with at least spaces, tabs or newlines. However, it does not work when the 2 words are separated with a non-breaking space (U+000A0). I wonder why. With a normal space (U+0020): $ echo "Egg yoke." | java languagetool-commandline.jar -l en -v - Expected text language: English (no spell checking active, specify a language variant like 'en-GB' if available) Working on STDIN... 1283 rules activated for language English 1283 rules activated for language English <S> Egg[egg/NN:UN,egg/VB,egg/VBP,B-NP-singular] yoke[yoke/NN,yoke/VB,yoke/VBP,E-NP-singular].[./.,</S>,O]<P/> Disambiguator log: 1.) Line 1, column 1, Rule ID: EGG_YOKE[1] Message: Did you mean 'egg yolk'? Suggestion: Egg yolk Egg yoke. ^^^^^^^^ Time: 2405ms for 0 sentences (0.0 sentences/sec) With a non breaking space: $ echo "Egg yoke." | java -jar languagetool-commandline.jar -l en -v - Expected text language: English (no spell checking active, specify a language variant like 'en-GB' if available) Working on STDIN... 1283 rules activated for language English 1283 rules activated for language English <S> Egg[egg/NN:UN,egg/VB,egg/VBP,B-NP-singular] [ /null]yoke[yoke/NN,yoke/VB,yoke/VBP].[./.,</S>]<P/> Disambiguator log: Time: 2347ms for 0 sentences (0.0 sentences/sec) With non-breaking space, the debug output shows a token [ /null] i.e. the non-breaking space is interpreted as word, and the rule does not match as a result. I think that spaces or non-breaking spaces should behave the same for LanguageTool. The only purpose of non-breaking space is for presentation, so that a line is not broken at a non-breaking space, but everything else should not be affected by non breaking space (same word count, same word tokenization, …) Regards Dominique
------------------------------------------------------------------------------
_______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel