Ruud Baars escribiu: > Hi I tried to read the documentation, but that is very technical. Not a > word about what it is really able to, and how it is trained. > > Would you know where I could find some info on this non-programmer level? > I need to find out the quality of distinction possible between > old-fashiond Dutch, German, Afrikaans, Frysian etc., for better > filtering of a corpus.
Hi Ruud, in this article you can find an explanation about the inner functioning of the Tika Language Identifier. http://www.ibm.com/developerworks/opensource/tutorials/os-apache-tika/section6.html AFAIK, most language identification tools are based on the algorithm described in this paper: William B. Cavnar, John M. Trenkle: N-Gram-Based Text Categorization http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.9367 Hope this helps. :) -- "The only funtion of economic forecasting is to make astrology look respectable." -- J.K. Galbraith. ------------------------------------------------------------------------------ Monitor your physical, virtual and cloud infrastructure from a single web console. Get in-depth insight into apps, servers, databases, vmware, SAP, cloud infrastructure, etc. Download 30-day Free Trial. Pricing starts from $795 for 25 servers or applications! http://p.sf.net/sfu/zoho_dev2dev_nov _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel