On 2015-08-06 11:31, Dawid Weiss wrote:

>> There are other issues where I could need help from an expert. For
>> example, results don't get better when we use 4grams instead of 
>> 3grams.
> 
> This is, I think, a general conclusion from using shingles of any data
> -- if you're increasing their lengths you also increase the sparsity
> of the model space.

I see, but the way we calculate the probability includes the lower 
ngrams:

https://github.com/languagetool-org/languagetool/blob/master/languagetool-core/src/main/java/org/languagetool/rules/ConfusionProbabilityRule.java#L309

Whether this is the right thing to do I'm not sure, thus the idea that 
someone could review the code.

Regards
  Daniel


------------------------------------------------------------------------------
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to