[jira] [Comment Edited] (TIKA-2790) Consider switching lang-detection in tika-eval to open-nlp

Tim Allison (JIRA) Mon, 03 Dec 2018 07:12:00 -0800


    [ 
https://issues.apache.org/jira/browse/TIKA-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707320#comment-16707320
 ]


Tim Allison edited comment on TIKA-2790 at 12/3/18 3:06 PM:
------------------------------------------------------------

We're currently using optimaize in tika-eval.  OpenNLP appears to have better 
coverage, and seems to be a healthier/more active project.

So, no, not really...once I fixed the regex problem (TIKA-2777). :D  But more 
coverage might be nice.

The other item is that I'd like to update our "common words" counts, and I 
notice that I can easily check-out a large chunk of leipzig from opennlp: 
https://svn.apache.org/repos/bigdata/opennlp/trunk 

So, rather than having to do my own download of wikis, one by one, I can 
download a bunch of data easily, and that data would align with the language 
detection.

What's your recommendation?


was (Author: [email protected]):
We're currently using optimaize in tika-eval.  OpenNLP appears to have better 
coverage, and seems to be a healthier/more active project.

So, no, not really...once I fixed the regex problem (TIKA-2777). :D  But more 
coverage might be nice.

What's your recommendation?

> Consider switching lang-detection in tika-eval to open-nlp
> ----------------------------------------------------------
>
>                 Key: TIKA-2790
>                 URL: https://issues.apache.org/jira/browse/TIKA-2790
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (TIKA-2790) Consider switching lang-detection in tika-eval to open-nlp

Reply via email to