Hi, even though I don't speak French, I've started adding confusion pairs for French. Here's an example from fr/confusion_sets.txt:
quand; quant; 1000000 # p=1.000, r=0.662, 186+988, 3grams, 2016-03-29 This means that whenever 'quand' appears, LT checks whether 'quant' isn't more probable here using Google ngrams[1] and vice versa. '1000000' is a factor to avoid false alarms. p=1.000, r=0.662 means: with my evaluation set, this pair has a precision of 1, i.e. it doesn't produce any false alarms and a recall of 0.662, i.e. 66,2% of all errors are detected. So far, there are only 9 pairs like this (pris/prix, don/donc, dans/dent etc.) but I'm going to add more. I'll do the same for Spanish. Feel free to also add pairs. You can check how well a pair works (and find a good factor with a low false alarm rate) using ConfusionRuleEvaluator from the languagetool-dev module. Regards Daniel [1] http://wiki.languagetool.org/finding-errors-using-n-gram-data ------------------------------------------------------------------------------ Transform Data into Opportunity. Accelerate data analysis in your applications with Intel Data Analytics Acceleration Library. Click to learn more. http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140 _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel