[ 
https://issues.apache.org/jira/browse/SOLR-11592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328244#comment-16328244
 ] 

Steve Rowe commented on SOLR-11592:
-----------------------------------

A note about model licensing: 

I intentionally did not include OpenNLP's pre-trained model in the patch, 
because the Leipzig corpora[1] were used to train the model.  The Leipzig 
corpora's license is CC BY-NC 4.0[2], which is on Apache's Category B list[3]; 
I think this means the Solr project could redistribute the OpenNLP pre-trained 
model, but I am uncertain.

For testing, a model is produced from a small subset of the same source data.  
I don't think we need to include licensing info for this test model derived 
from Leipzig corpora data, but I'm open to other perspectives.

[1] Leipzig corpora: http://wortschatz.uni-leipzig.de/en/download/
[2] Leipzig corpora Terms of Usage: http://wortschatz.uni-leipzig.de/en/usage
[3] Apache "Category B" 3rd party licenses: 
https://www.apache.org/legal/resolved.html#category-b

> add another language detector using OpenNLP
> -------------------------------------------
>
>                 Key: SOLR-11592
>                 URL: https://issues.apache.org/jira/browse/SOLR-11592
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: contrib - LangId
>    Affects Versions: 7.1
>            Reporter: Koji Sekiguchi
>            Priority: Minor
>         Attachments: SOLR-11592.patch, SOLR-11592.patch
>
>
> We already have two language detectors, lang-detect and Tika's lang detect. 
> This is a ticket that gives users third option using OpenNLP. :)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to