[
https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13399243#comment-13399243
]
Lance Norskog commented on LUCENE-2899:
---------------------------------------
bq. For NER you should try the perceptron and a cutoff of zero.
Thanks! This patch generates all models needed by tests, and the tests are
rewritten to use the poor quality data from the models. To make the models, go
to {{solr/contrib/opennlp/src/test-files/training}} and run
{{bin/training.sh}}. This populates
{{solr/contrib/opennlp/src/test-files/opennlp/conf/opennlp}}. I don't have
windows anymore so I can't make a .bat version.
> Add OpenNLP Analysis capabilities as a module
> ---------------------------------------------
>
> Key: LUCENE-2899
> URL: https://issues.apache.org/jira/browse/LUCENE-2899
> Project: Lucene - Java
> Issue Type: New Feature
> Components: modules/analysis
> Reporter: Grant Ingersoll
> Assignee: Grant Ingersoll
> Priority: Minor
> Attachments: LUCENE-2899.patch, opennlp_trunk.patch
>
>
> Now that OpenNLP is an ASF project and has a nice license, it would be nice
> to have a submodule (under analysis) that exposed capabilities for it. Drew
> Farris, Tom Morton and I have code that does:
> * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it
> would have to change slightly to buffer tokens)
> * NamedEntity recognition as a TokenFilter
> We are also planning a Tokenizer/TokenFilter that can put parts of speech as
> either payloads (PartOfSpeechAttribute?) on a token or at the same position.
> I'd propose it go under:
> modules/analysis/opennlp
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]