updated ngram data

Daniel Naber Mon, 17 Aug 2015 02:50:04 -0700

Hi,

I've updated our ngram data that is used to power our English homophone 
confusion rule. This is the rule that finds many cases where words like 
there/their, breathe/breath etc are confused. The new ngram data is 
based on the Google ngram data from 2012 and it's much larger than the 
previous version and should thus improve quality of the error detection.


The new data has been active on languagetool.org for about 3 weeks and 
seems to work fine. It is now also available for download. More data 
means even further increased size, the data is now 8 GB:

http://languagetool.org/download/ngram-data/

If you want to make use of this but cannot download this much, you can 
use our HTTP API, which also uses the latest ngram data 
(http://wiki.languagetool.org/public-http-api).

More background information about our ngram check is available at
http://wiki.languagetool.org/finding-errors-using-n-gram-data

Regards
  Daniel


------------------------------------------------------------------------------
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

updated ngram data

Reply via email to