2015-06-08 9:39 GMT+02:00 Daniel Naber <daniel.na...@languagetool.org>:
> On 2015-06-02 15:06, Jaume Ortolà i Font wrote: > > Hi Jaume, > > sorry for the late reply. > > > There are some failures with the current German LanguageTool tests. > > Could you take a look, Daniel? You need to use replacements in > > lower-case (r rh, rh r). Are the results reasonable? > > This case looks like a regression to me: > > Not found: 'Haus' in: [Hauch, Hau, Haue, Haut, -Au, -Aue, -Aug, -Haus, > -Haut, Ahaus, Back, Baku, Bank, Bark, Bau, Bau-, Baud, Baum, Baus, > Chauke] > > As long as there's a suggestion with a distance of 1, shouldn't it be > preferred over suggestions with a distance of 2? > > For the case "Ligafußboll", the suggestion with a distance of 2 seems to > be lost, I think that shouldn't be the case: > > Expected :[Ligafußball, Ligafußballs] > Actual :[Ligafußball] > You are right. These results are not expected. I will look at them again. A question: "Ligafußball" doesn't exist as a word in the dictionary. It's a compound, isn't it? > > If the preferred option in German is convert-case=false, then my > > changes will not affect the German tests in any way. > > Could you describe what exactly convert-case does, I'm not sure I > completely understand it. > It is the same for replacement-pairs, convert-case and ignore-diacritics. If any of these features is enabled, then these differences add a distance of 0 between the original word and the possible suggestion. Examples: If "ss ß" is in replacement-pairs, the distance between Ligafussball (original wrong word) and Ligafußball (suggestion) is zero. If convert-case=true, the distance between ligafußball (original word) and Ligafußball (suggestion) is zero. If ignore-diacritics=true, the distance between horen (original word) and hören (suggestion) is zero. If ignore-diacritics=true, the distance between horem (original word) and hören (suggestion) is one (not two). In the file de_DE.info you wrote: # ignore-diacritics=false speeds up building the suggestions by a factor of about 2: Is that true with the current Speller code? A question for Marcin: As you can see here [1], the condition isConvertingCase() is inside the condition isIgnoringDiacritics(), so they are not independent. Was it made on purpose? Should we correct it? Currently, for German, convert-case is true (by default) and ignore-diacritics is false. "convert-case=true" is necessary for capitalized words (for example, at the start of a sentence) not to be marked as errors. But when the Speller looks for suggestions, as ignore-diacritics=false, the condition convert-case=true is ignored. Regards, Jaume [1] https://github.com/morfologik/morfologik-stemming/blob/master/morfologik-speller/src/main/java/morfologik/speller/Speller.java#L601
------------------------------------------------------------------------------
_______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel