On Fri, 27 Mar 2015 08:23:19 -0700, Jakob Fix <[email protected]> wrote:
> Hello, I think this message got lost when the mailing list was down in > February (or nobody has an answer ...) > > Thanks, > Jakob. The xdmp:encoding-language-detect uses the ICU libraries to do the detection. Serbian and Croatian are very closely related to each other and have some similar orthography to Latvian (although not a great deal of linguistic similarity, it must be said). I think the ICU libraries probably lack some of the linguistic sophistication of Google's backend. It has nothing to do with the licensing options. //Mary > > ---------- Forwarded message ---------- > From: Jakob Fix <[email protected]> > Date: Sat, Feb 28, 2015 at 10:59 PM > Subject: question about xdmp:encoding-language-detect > To: General Mark Logic Developer Discussion > <[email protected]> > > > Hello, > > using ML7.0-3, the above function, given more than 3500 characters of > Latvian news story text, returns Croatian twice and Serbian once in > the top three results: > > <encoding-language xmlns="xdmp:encoding-language-detect"> > <encoding>utf-8</encoding> > <language>hr</language> > <score>7.081</score> > </encoding-language> > <encoding-language xmlns="xdmp:encoding-language-detect"> > <encoding>utf-8</encoding> > <language>hr</language> > <score>7.012</score> > </encoding-language> > <encoding-language xmlns="xdmp:encoding-language-detect"> > <encoding>utf-8</encoding> > <language>sr</language> > <score>6.882</score> > </encoding-language> > ... > > and no Latvian in sight. Google translate as well as > detectlanguage.com correctly and with sufficient self-assurance return > the correct result. > > Can someone explain what the reason behind this lack of confidence and > the wrong detection is? Do you need the right language pack (I'm > playing around with the developer licence which I thought is > full-featured)? Is this something that needs training? The doc doesn't > say so. > > Thanks! > > cheers, > Jakob. > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general -- Using Opera's revolutionary email client: http://www.opera.com/mail/ _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
