[ 
https://issues.apache.org/jira/browse/OPENNLP-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16862283#comment-16862283
 ] 

Tim Allison commented on OPENNLP-1270:
--------------------------------------

Performance on the current languages doesn't change signficantly:

||Length||MS||Accuracy||
|10|115|0.64|
|20|91|0.84|
|30|106|0.89|
|40|111|0.92|
|50|125|0.92|
|100|282|0.92|
|150|325|0.95|
|200|419|0.96|
|500|1078|0.98|
|1000|1936|1.00|
|5000|8970|1.00|
|10000|17550|0.99|
|20000|36464|1.00|

Compare with: 
[OPENNLP-1261|https://issues.apache.org/jira/browse/OPENNLP-1261?focusedCommentId=16862253&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16862253]
 for a comparison with the current model.

> Add new languages to the language detector
> ------------------------------------------
>
>                 Key: OPENNLP-1270
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1270
>             Project: OpenNLP
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>         Attachments: report.txt
>
>
> Leipzig has several other languages that might be useful to add to the 
> language detector.  I've selected some with > 10k sentences.  Once I build 
> the model and evaluate performance, I'll share the reports, the model and a 
> tgz of the *-sentences.txt files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to