Language ID - confidence factor
-------------------------------
Key: NUTCH-960
URL: https://issues.apache.org/jira/browse/NUTCH-960
Project: Nutch
Issue Type: Wish
Affects Versions: 1.2
Reporter: M Alexander
Hi
In JAVA implementation, what is the best way to calculate the confidence of the
outcome of the language id for a given text?
For example:
n-gram matching / total n-gram * 100.
when a text is passed. The outcome would be "en" with 89% confidence. What is
the best way to implement this to the existig nutch language id code?
Thanks
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.