Hi

Consider this very simple rule in the English grammar.xml:

<rule id="EGG_YOKE" name="egg yoke (egg yolk)">
  <pattern>
    <token>egg</token>
    <token>yoke</token>
  </pattern>

The rule works fine of the 2 words are separated with
at least spaces, tabs or newlines. However, it does not
work when the 2 words are separated with a non-breaking
space (U+000A0).  I wonder why.

With a normal space (U+0020):

$ echo "Egg yoke." | java languagetool-commandline.jar -l en -v -
Expected text language: English (no spell checking active, specify a
language variant like 'en-GB' if available)
Working on STDIN...
1283 rules activated for language English
1283 rules activated for language English
<S> Egg[egg/NN:UN,egg/VB,egg/VBP,B-NP-singular]
yoke[yoke/NN,yoke/VB,yoke/VBP,E-NP-singular].[./.,</S>,O]<P/>
Disambiguator log:

1.) Line 1, column 1, Rule ID: EGG_YOKE[1]
Message: Did you mean 'egg yolk'?
Suggestion: Egg yolk
Egg yoke.
^^^^^^^^
Time: 2405ms for 0 sentences (0.0 sentences/sec)


With a non breaking space:

$ echo "Egg yoke." | java -jar languagetool-commandline.jar -l en -v -
Expected text language: English (no spell checking active, specify a
language variant like 'en-GB' if available)
Working on STDIN...
1283 rules activated for language English
1283 rules activated for language English
<S> Egg[egg/NN:UN,egg/VB,egg/VBP,B-NP-singular] [
/null]yoke[yoke/NN,yoke/VB,yoke/VBP].[./.,</S>]<P/>

Disambiguator log:

Time: 2347ms for 0 sentences (0.0 sentences/sec)

With non-breaking space, the debug output shows
a token [ /null] i.e. the non-breaking space is
interpreted as word, and the rule does not match
as a result.

I think that spaces or non-breaking spaces should behave
the same for LanguageTool. The only purpose of non-breaking
space is for presentation, so that a line is not broken at
a non-breaking space, but everything else should not be
affected by non breaking space (same word count, same
word tokenization, …)

Regards
Dominique
------------------------------------------------------------------------------
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to