Daniel Naber <daniel.na...@languagetool.org> wrote: > Hi, > > we have quite some changes in the nightly tests today. I'm not sure what > the cause is, could you check your language and see if the changes are > good or bad? > > https://languagetool.org/regression-tests/20151013/ > > Regards > Daniel
Hi Daniel For French, all errors are in the java rule FRENCH_WHITESPACE. It flags as an error » preceded by a non breaking space. What I find odd, is that LT does not flag as a error « followed by a non breaking space. $ echo '« Test ».' | java -jar languagetool-commandline.jar -l fr -v Expected text language: French Working on STDIN... 2527 rules activated for language French 2527 rules activated for language French <S> «[«/null] Test[test/N m s] »[»/null].[./M fin,</S>]<P/> Disambiguator log: 1.) Line 1, column 7, Rule ID: FRENCH_WHITESPACE Message: Le guillemet fermant est précédé d'une espace fine insécable. Suggestion: » « Test ». ^^ Time: 552ms for 0 sentences (0.0 sentences/sec) Similarly, it flags as an error a non-breaking space before a question marks or exlamation marks: $ echo 'Test ?' | java -jar languagetool-commandline.jar -l fr -v Expected text language: French Working on STDIN... 2527 rules activated for language French 2527 rules activated for language French <S> Test[test/N m s] ?[?/M fin inte,</S>]<P/> Disambiguator log: 1.) Line 1, column 5, Rule ID: FRENCH_WHITESPACE Message: Point d'interrogation est précédé d'une espace fine insécable. Suggestion: ? Test ? ^^ Time: 622ms for 0 sentences (0.0 sentences/sec) So the regression in French must be caused the the change regarding non breaking space. I wonder why we did not see the errors already yesterday. There seem to be a lag with continuous integration. The rule that is broken is FRENCH_WHITESPACE in: languagetool-language-modules/fr/src/main/java/org/languagetool/rules/fr/QuestionWhitespaceRule.java I think that tokens[i].isWhitespaceBefore() is now true when there is a nbsp before, whereas before change it was false. I can see that sometimes, it's OK to treat nbsp as a whitespace, but here, it's not since the goal of the rule is precisely to check for nbsp (either U+00A0 or U+202F). I'm not sure how to fix it. Maybe we need an API to return the kind of white space? In the mean time, we may consider reverting the recent nbsp change until we find and agree on a solution. For Esperanto, I only see errors in rule VNETR_AKU. It looks like the rule must be improved to avoid the false positive, but I don't understand why we did not have the false positives before, unless perhaps the text that we check has changed? For Breton, almost all new errors are in rule UPPERCASE_SENTENCE_START. But I don't understand why. There is not enough context to understand the errors. Regards Dominique
------------------------------------------------------------------------------
_______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel