Thanks, it helps a little. My problem is the poor quality of detection for Dutch, maybe because of bad training.
Training with better data than Wikipedia would probably help. A Wiki is focussed on non-daily objects, lots af them abroad or special. That is why a Wiki is bad training material right from the start. So I am curious how it is trained and used. Ruud On 17-11-12 18:05, Susana Sotelo Docio wrote: > Ruud Baars escribiu: >> Hi I tried to read the documentation, but that is very technical. Not a >> word about what it is really able to, and how it is trained. >> >> Would you know where I could find some info on this non-programmer level? >> I need to find out the quality of distinction possible between >> old-fashiond Dutch, German, Afrikaans, Frysian etc., for better >> filtering of a corpus. > Hi Ruud, > > in this article you can find an explanation about the inner functioning of > the Tika Language Identifier. > > http://www.ibm.com/developerworks/opensource/tutorials/os-apache-tika/section6.html > > AFAIK, most language identification tools are based on the algorithm > described in this paper: > > William B. Cavnar, John M. Trenkle: N-Gram-Based Text Categorization > http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.9367 > > Hope this helps. :) > ------------------------------------------------------------------------------ Monitor your physical, virtual and cloud infrastructure from a single web console. Get in-depth insight into apps, servers, databases, vmware, SAP, cloud infrastructure, etc. Download 30-day Free Trial. Pricing starts from $795 for 25 servers or applications! http://p.sf.net/sfu/zoho_dev2dev_nov _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel