[ 
https://issues.apache.org/jira/browse/OPENNLP-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16853462#comment-16853462
 ] 

Tim Allison edited comment on OPENNLP-1265 at 5/31/19 11:25 PM:
----------------------------------------------------------------

Baseline:
Input string: 10000x "estava em uma marcenaria na Rua Bruno "
model: langdetect-183.bin
runs: 4 (don't show results for first warmup run) runs of 50 detections

Results (millis, lang)
13366 : por=50
13608 : por=50
14035 : por=50

If we switch to working with string based ngrams instead of StringList, there's 
a 2x improvement:
6087 : por=50
6202 : por=50
6146 : por=50

see: 
https://github.com/tballison/opennlp/blob/OPENNLP-1265/opennlp-tools/src/main/java/opennlp/tools/ngram/NGramModelSimplified.java


was (Author: [email protected]):
Baseline:
Input string: 10000x "estava em uma marcenaria na Rua Bruno "
model: langdetect-183.bin
runs: 4 (don't show results for first warmup run)

Results (millis, lang)
13366 : por=50
13608 : por=50
14035 : por=50

If we switch to working with string based ngrams instead of StringList, there's 
a 2x improvement:
6087 : por=50
6202 : por=50
6146 : por=50

see: 
https://github.com/tballison/opennlp/blob/OPENNLP-1265/opennlp-tools/src/main/java/opennlp/tools/ngram/NGramModelSimplified.java

> Improve speed of lang detect
> ----------------------------
>
>                 Key: OPENNLP-1265
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1265
>             Project: OpenNLP
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>
> Over on TIKA-2790, we found that opennlp's language detector is far, far 
> slower than Optimaize and yalder.
> Let's use this ticket to see what we can do to improve lang detect's speed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to