[
https://issues.apache.org/jira/browse/SOLR-11592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328236#comment-16328236
]
Steve Rowe edited comment on SOLR-11592 at 1/17/18 4:01 AM:
------------------------------------------------------------
[~koji], I've attached a modified version of your patch that I think is ready
to go, including ref guide docs, a {{CHANGES.txt}} entry, and tests; tests and
precommit pass for me. If you have time I'd appreciate a review.
Notable changes from the previous version of the patch:
* Added target {{train-test-models}} to the langid contrib's {{build.xml}}.
This downloads Leipzig corpora data files for five languages, extracts the data
required for OpenNLP to train a model, then trains a test model. The resulting
model is included in the patch.
* Added tests that use the test model.
* Automatically convert from the 3-letter ISO 639-3 codes provided by the
OpenNLP model into the corresponding 2-letter ISO 639-1 codes, to match the
other two langid implementations.
* Modified the update processor factory to interrogate the "invariants" and
"defaults" config sections for the {{langid.model}} param.
was (Author: steve_rowe):
[~koji], I've attached a modified version of your patch that I think is ready
to go, including ref guide docs, a {{CHANGES.txt}} entry, and tests; tests and
precommit pass for me. If you have time I'd appreciate a review.
Notable changes from the previous version of the patch:
* Added target {{train-test-models}} to the langid contrib's {{build.xml}}.
This downloads Leipzig corpora data files for five languages, extracts the data
required for OpenNLP to train a model, then trains a test model. The resulting
model is included in the patch.
* Added tests that use the test model.
* Automatically convert from the 3-letter ISO 639-3 codes provided by the
OpenNLP model into the corresponding 2-letter ISO 639-1 codes, to match the
other two langid implementations.
* Modified the update process factory to interrogate the "invariants" and
"defaults" config sections for the {{langid.model}} param.
> add another language detector using OpenNLP
> -------------------------------------------
>
> Key: SOLR-11592
> URL: https://issues.apache.org/jira/browse/SOLR-11592
> Project: Solr
> Issue Type: New Feature
> Security Level: Public(Default Security Level. Issues are Public)
> Components: contrib - LangId
> Affects Versions: 7.1
> Reporter: Koji Sekiguchi
> Priority: Minor
> Attachments: SOLR-11592.patch, SOLR-11592.patch
>
>
> We already have two language detectors, lang-detect and Tika's lang detect.
> This is a ticket that gives users third option using OpenNLP. :)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]