Hello, using ML7.0-3, the above function, given more than 3500 characters of Latvian news story text, returns Croatian twice and Serbian once in the top three results:
<encoding-language xmlns="xdmp:encoding-language-detect"> <encoding>utf-8</encoding> <language>hr</language> <score>7.081</score> </encoding-language> <encoding-language xmlns="xdmp:encoding-language-detect"> <encoding>utf-8</encoding> <language>hr</language> <score>7.012</score> </encoding-language> <encoding-language xmlns="xdmp:encoding-language-detect"> <encoding>utf-8</encoding> <language>sr</language> <score>6.882</score> </encoding-language> ... and no Latvian in sight. Google translate as well as detectlanguage.com correctly and with sufficient self-assurance return the correct result. Can someone explain what the reason behind this lack of confidence and the wrong detection is? Do you need the right language pack (I'm playing around with the developer licence which I thought is full-featured)? Is this something that needs training? The doc doesn't say so. Thanks! cheers, Jakob. _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
