[MarkLogic Dev General] Fwd: question about xdmp:encoding-language-detect

Jakob Fix Fri, 27 Mar 2015 08:24:17 -0700

Hello, I think this message got lost when the mailing list was down in
February (or nobody has an answer ...)

Thanks,
Jakob.

---------- Forwarded message ----------
From: Jakob Fix <[email protected]>
Date: Sat, Feb 28, 2015 at 10:59 PM
Subject: question about xdmp:encoding-language-detect
To: General Mark Logic Developer Discussion <[email protected]>

Hello,

using ML7.0-3, the above function, given more than 3500 characters of
Latvian news story text, returns Croatian twice and Serbian once in
the top three results:

<encoding-language xmlns="xdmp:encoding-language-detect">
  <encoding>utf-8</encoding>
  <language>hr</language>
  <score>7.081</score>
</encoding-language>
<encoding-language xmlns="xdmp:encoding-language-detect">
  <encoding>utf-8</encoding>
  <language>hr</language>
  <score>7.012</score>
</encoding-language>
<encoding-language xmlns="xdmp:encoding-language-detect">
  <encoding>utf-8</encoding>
  <language>sr</language>
  <score>6.882</score>
</encoding-language>
...

and no Latvian in sight. Google translate as well as
detectlanguage.com correctly and with sufficient self-assurance return
the correct result.

Can someone explain what the reason behind this lack of confidence and
the wrong detection is? Do you need the right language pack (I'm
playing around with the developer licence which I thought is
full-featured)? Is this something that needs training? The doc doesn't
say so.

Thanks!

cheers,
Jakob.
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

[MarkLogic Dev General] Fwd: question about xdmp:encoding-language-detect

Reply via email to