Daniel Naber <daniel.na...@languagetool.org> wrote:

> Hi,
>
> we have quite some changes in the nightly tests today. I'm not sure what
> the cause is, could you check your language and see if the changes are
> good or bad?
>
> https://languagetool.org/regression-tests/20151013/
>
> Regards
>   Daniel

Hi Daniel

For French, all errors are in the java rule FRENCH_WHITESPACE.
It flags as an error » preceded by  a non breaking space.

What I find odd, is that LT does not flag as a error « followed by
a non breaking space.

$ echo '« Test ».' | java -jar languagetool-commandline.jar -l fr -v
Expected text language: French
Working on STDIN...
2527 rules activated for language French
2527 rules activated for language French
<S> «[«/null] Test[test/N m s] »[»/null].[./M fin,</S>]<P/>
Disambiguator log:

1.) Line 1, column 7, Rule ID: FRENCH_WHITESPACE
Message: Le guillemet fermant est précédé d'une espace fine insécable.
Suggestion:  »
« Test ».
      ^^
Time: 552ms for 0 sentences (0.0 sentences/sec)

Similarly, it flags as an error a non-breaking space before
a question marks or exlamation marks:

$ echo 'Test ?' | java -jar languagetool-commandline.jar -l fr -v
Expected text language: French
Working on STDIN...
2527 rules activated for language French
2527 rules activated for language French
<S> Test[test/N m s] ?[?/M fin inte,</S>]<P/>
Disambiguator log:

1.) Line 1, column 5, Rule ID: FRENCH_WHITESPACE
Message: Point d'interrogation est précédé d'une espace fine insécable.
Suggestion:  ?
Test ?
    ^^
Time: 622ms for 0 sentences (0.0 sentences/sec)

So the regression in French must be caused the the
change regarding non breaking space.  I wonder
why we did not see the errors already yesterday.
There seem to be a lag with continuous integration.

The rule that is broken is FRENCH_WHITESPACE in:
languagetool-language-modules/fr/src/main/java/org/languagetool/rules/fr/QuestionWhitespaceRule.java

I think that tokens[i].isWhitespaceBefore() is now true when there
is a nbsp before, whereas before change it was false.
I can see that sometimes, it's OK to treat nbsp as a whitespace,
but here, it's not since the goal of the rule is precisely to check for
nbsp (either U+00A0 or U+202F). I'm not sure how to fix it. Maybe
we need an API to return the kind of white space?

In the mean time, we may consider reverting the recent
nbsp change until we find and agree on a solution.

For Esperanto, I only see errors in rule VNETR_AKU.
It looks like the rule must be improved to avoid the false
positive, but I don't understand why we did not have the
false positives before, unless perhaps the text that we
check has changed?

For Breton, almost all new errors are in rule
UPPERCASE_SENTENCE_START. But I don't
understand why. There is not enough context
to understand the errors.

Regards
Dominique
------------------------------------------------------------------------------
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to