[
https://issues.apache.org/jira/browse/TIKA-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856039#comment-16856039
]
Tim Allison commented on TIKA-2790:
-----------------------------------
I was able to get 4x improvement in speed, which is still slower than Optimaize
and, far, far slower than Yalder. IIUC, both Optimaize and Yalder do not
process the full string. Rather, they sample or have some kind of stopping
criterion. I think we can work towards that in our own wrapper of OpenNLP,
and, hopefully, we can push that upstream back into OpenNLP.
> Consider switching lang-detection in tika-eval to open-nlp
> ----------------------------------------------------------
>
> Key: TIKA-2790
> URL: https://issues.apache.org/jira/browse/TIKA-2790
> Project: Tika
> Issue Type: Improvement
> Reporter: Tim Allison
> Priority: Major
> Attachments: fra_mixed_100000_0.0_0.txt, langid_20190509.zip,
> langid_20190510.zip, langid_20190514.zip, langid_20190514_plus_minus_1.zip
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)