Hi, some time ago, I've added a rule for English to detect errors statistically, by using large ngram data sets. I've activated the rule now for all languages that we have data for: Chinese, French, Italian, Russian, and Spanish (German had been activated for some time already).
That means rule developers can add word pairs to the 'confusion_sets.txt' file and LT will try to detect wrong usage of either word of the pair. Here's how you can use this approach to detect errors: 1.) Download the (large) data from http://languagetool.org/download/ngram-data/untested/ for your language 2.) Follow the documentation at http://wiki.languagetool.org/adding-n-gram-data-rules This is not a general replacement for writing rules manually, but it's often easier and it sometimes works better. In my experience, it's had to tell which word pairs work will with this approach, it's something one just has to experiment with. Please give it a try and let me know if you have feedback or questions. Regards Daniel ------------------------------------------------------------------------------ Monitor Your Dynamic Infrastructure at Any Scale With Datadog! Get real-time metrics from all of your servers, apps and tools in one place. SourceForge users - Click here to start your Free Trial of Datadog now! http://pubads.g.doubleclick.net/gampad/clk?id=241902991&iu=/4140 _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel