Ruud Baars escribiu:
> Hi I tried to read the documentation, but that is very technical. Not a 
> word about what it is really able to, and how it is trained.
> 
> Would you know where I could find some info on this non-programmer level?
> I need to find out the quality of distinction possible between 
> old-fashiond Dutch, German, Afrikaans, Frysian etc., for better 
> filtering of a corpus.

Hi Ruud,

in this article you can find an explanation about the inner functioning of
the Tika Language Identifier.

http://www.ibm.com/developerworks/opensource/tutorials/os-apache-tika/section6.html

AFAIK, most language identification tools are based on the algorithm
described in this paper:

William B. Cavnar, John M. Trenkle: N-Gram-Based Text Categorization
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.9367

Hope this helps. :)

-- 
"The only funtion of economic forecasting is to make astrology look
respectable." -- J.K. Galbraith.

------------------------------------------------------------------------------
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to