On 2015-08-06 11:31, Dawid Weiss wrote: >> There are other issues where I could need help from an expert. For >> example, results don't get better when we use 4grams instead of >> 3grams. > > This is, I think, a general conclusion from using shingles of any data > -- if you're increasing their lengths you also increase the sparsity > of the model space.
I see, but the way we calculate the probability includes the lower ngrams: https://github.com/languagetool-org/languagetool/blob/master/languagetool-core/src/main/java/org/languagetool/rules/ConfusionProbabilityRule.java#L309 Whether this is the right thing to do I'm not sure, thus the idea that someone could review the code. Regards Daniel ------------------------------------------------------------------------------ _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel