[
https://issues.apache.org/jira/browse/SOLR-11592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328244#comment-16328244
]
Steve Rowe commented on SOLR-11592:
-----------------------------------
A note about model licensing:
I intentionally did not include OpenNLP's pre-trained model in the patch,
because the Leipzig corpora[1] were used to train the model. The Leipzig
corpora's license is CC BY-NC 4.0[2], which is on Apache's Category B list[3];
I think this means the Solr project could redistribute the OpenNLP pre-trained
model, but I am uncertain.
For testing, a model is produced from a small subset of the same source data.
I don't think we need to include licensing info for this test model derived
from Leipzig corpora data, but I'm open to other perspectives.
[1] Leipzig corpora: http://wortschatz.uni-leipzig.de/en/download/
[2] Leipzig corpora Terms of Usage: http://wortschatz.uni-leipzig.de/en/usage
[3] Apache "Category B" 3rd party licenses:
https://www.apache.org/legal/resolved.html#category-b
> add another language detector using OpenNLP
> -------------------------------------------
>
> Key: SOLR-11592
> URL: https://issues.apache.org/jira/browse/SOLR-11592
> Project: Solr
> Issue Type: New Feature
> Security Level: Public(Default Security Level. Issues are Public)
> Components: contrib - LangId
> Affects Versions: 7.1
> Reporter: Koji Sekiguchi
> Priority: Minor
> Attachments: SOLR-11592.patch, SOLR-11592.patch
>
>
> We already have two language detectors, lang-detect and Tika's lang detect.
> This is a ticket that gives users third option using OpenNLP. :)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]