[
https://issues.apache.org/jira/browse/OPENNLP-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated OPENNLP-1267:
---------------------------------
Description:
On TIKA-2790, I found that Yalder is stopping after computing character ngrams
on roughly the first 60 characters. That _likely_ explains its impressive
speed. Let's make this "stopping short" feature available in OpenNLP.
Ideally, the language detector wouldn't copy the full String, it wouldn't
normalize the full String, and it wouldn't compute ngrams on the full String.
was:On TIKA-2790, I found that Yalder is stopping after computing character
ngrams on roughly the first 60 characters. That _likely_ explains its
impressive speed. Let's make this "stopping short" feature available in
OpenNLP.
> Allow the LanguageDetector to stop before processing the full string
> --------------------------------------------------------------------
>
> Key: OPENNLP-1267
> URL: https://issues.apache.org/jira/browse/OPENNLP-1267
> Project: OpenNLP
> Issue Type: Improvement
> Reporter: Tim Allison
> Priority: Major
>
> On TIKA-2790, I found that Yalder is stopping after computing character
> ngrams on roughly the first 60 characters. That _likely_ explains its
> impressive speed. Let's make this "stopping short" feature available in
> OpenNLP.
>
> Ideally, the language detector wouldn't copy the full String, it wouldn't
> normalize the full String, and it wouldn't compute ngrams on the full String.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)