[ 
https://issues.apache.org/jira/browse/OPENNLP-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated OPENNLP-1267:
---------------------------------
    Description: 
On TIKA-2790, I found that Yalder is stopping after computing character ngrams 
on roughly the first 60 characters.  That _likely_ explains its impressive 
speed.  Let's make this "stopping short" feature available in OpenNLP.

 

Ideally, the language detector wouldn't copy the full String, it wouldn't 
normalize the full String, and it wouldn't compute ngrams on the full String.

  was:On TIKA-2790, I found that Yalder is stopping after computing character 
ngrams on roughly the first 60 characters.  That _likely_ explains its 
impressive speed.  Let's make this "stopping short" feature available in 
OpenNLP.


> Allow the LanguageDetector to stop before processing the full string
> --------------------------------------------------------------------
>
>                 Key: OPENNLP-1267
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1267
>             Project: OpenNLP
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Priority: Major
>
> On TIKA-2790, I found that Yalder is stopping after computing character 
> ngrams on roughly the first 60 characters.  That _likely_ explains its 
> impressive speed.  Let's make this "stopping short" feature available in 
> OpenNLP.
>  
> Ideally, the language detector wouldn't copy the full String, it wouldn't 
> normalize the full String, and it wouldn't compute ngrams on the full String.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to