Hello, I think this message got lost when the mailing list was down in February (or nobody has an answer ...)
Thanks, Jakob. ---------- Forwarded message ---------- From: Jakob Fix <[email protected]> Date: Sat, Feb 28, 2015 at 10:59 PM Subject: question about xdmp:encoding-language-detect To: General Mark Logic Developer Discussion <[email protected]> Hello, using ML7.0-3, the above function, given more than 3500 characters of Latvian news story text, returns Croatian twice and Serbian once in the top three results: <encoding-language xmlns="xdmp:encoding-language-detect"> <encoding>utf-8</encoding> <language>hr</language> <score>7.081</score> </encoding-language> <encoding-language xmlns="xdmp:encoding-language-detect"> <encoding>utf-8</encoding> <language>hr</language> <score>7.012</score> </encoding-language> <encoding-language xmlns="xdmp:encoding-language-detect"> <encoding>utf-8</encoding> <language>sr</language> <score>6.882</score> </encoding-language> ... and no Latvian in sight. Google translate as well as detectlanguage.com correctly and with sufficient self-assurance return the correct result. Can someone explain what the reason behind this lack of confidence and the wrong detection is? Do you need the right language pack (I'm playing around with the developer licence which I thought is full-featured)? Is this something that needs training? The doc doesn't say so. Thanks! cheers, Jakob. _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
