[
https://issues.apache.org/jira/browse/TIKA-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707822#comment-16707822
]
Ken Krugler commented on TIKA-2790:
-----------------------------------
[[email protected]] - I've compared Yalder to Optimaize's version of
language-detector. For the EuroParl sample (21 languages, 1000 chunks of text
for each):
language-detector 1201ms, 0.29% error rate
yalder 591ms, 0.06% error rate
yalder (20 ngram min count) 555ms, 0.085% error rate
> Consider switching lang-detection in tika-eval to open-nlp
> ----------------------------------------------------------
>
> Key: TIKA-2790
> URL: https://issues.apache.org/jira/browse/TIKA-2790
> Project: Tika
> Issue Type: Improvement
> Reporter: Tim Allison
> Priority: Minor
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)