Hi, I've updated our ngram data that is used to power our English homophone confusion rule. This is the rule that finds many cases where words like there/their, breathe/breath etc are confused. The new ngram data is based on the Google ngram data from 2012 and it's much larger than the previous version and should thus improve quality of the error detection.
The new data has been active on languagetool.org for about 3 weeks and seems to work fine. It is now also available for download. More data means even further increased size, the data is now 8 GB: http://languagetool.org/download/ngram-data/ If you want to make use of this but cannot download this much, you can use our HTTP API, which also uses the latest ngram data (http://wiki.languagetool.org/public-http-api). More background information about our ngram check is available at http://wiki.languagetool.org/finding-errors-using-n-gram-data Regards Daniel ------------------------------------------------------------------------------ _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel