[
https://issues.apache.org/jira/browse/OPENNLP-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16853511#comment-16853511
]
Tim Allison commented on OPENNLP-1265:
--------------------------------------
As we've discussed on other issues, it is not within OpenNLP's design to put in
huge chunks of text (e.g. this experimental design may be stupid), and I
recognize that these tools have been designed for strings containing actual
"L". However, for our use case on Tika, we can see some crazy stuff, including
plenty of non-language containing strings.
So, I defer to you all on how you want to handle the above observations. We
can take care of some of these things on the Tika side, but what is useful for
you? How should I break these items into individual pull requests/tickets?
I'll kick the tires on some other areas next week.
> Improve speed of lang detect
> ----------------------------
>
> Key: OPENNLP-1265
> URL: https://issues.apache.org/jira/browse/OPENNLP-1265
> Project: OpenNLP
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Major
>
> Over on TIKA-2790, we found that opennlp's language detector is far, far
> slower than Optimaize and yalder.
> Let's use this ticket to see what we can do to improve lang detect's speed.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)