[
https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13676698#comment-13676698
]
Lance Norskog commented on LUCENE-2899:
---------------------------------------
I found the problem with multiple documents. The API for reusing Tokenizers
changed something more sensible, but I only noticed and implemented part of the
change. The result was than when you upload multiple documents, it just
re-processed the first document.
File LUCENE-2899-x.patch has this fix. It applies against the 4.x branch and
the trunk. It does not apply against Lucene 4.0, 4.1, 4.2 or 4.3. For all
released Solr versions you want LUCENE-2899.patch from August 27, 2012. There
are no new features since that release.
> Add OpenNLP Analysis capabilities as a module
> ---------------------------------------------
>
> Key: LUCENE-2899
> URL: https://issues.apache.org/jira/browse/LUCENE-2899
> Project: Lucene - Core
> Issue Type: New Feature
> Components: modules/analysis
> Reporter: Grant Ingersoll
> Assignee: Grant Ingersoll
> Priority: Minor
> Fix For: 4.4
>
> Attachments: LUCENE-2899-current.patch, LUCENE-2899.patch,
> LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch,
> LUCENE-2899.patch, LUCENE-2899-RJN.patch, LUCENE-2899-x.patch,
> OpenNLPFilter.java, OpenNLPTokenizer.java, opennlp_trunk.patch
>
>
> Now that OpenNLP is an ASF project and has a nice license, it would be nice
> to have a submodule (under analysis) that exposed capabilities for it. Drew
> Farris, Tom Morton and I have code that does:
> * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it
> would have to change slightly to buffer tokens)
> * NamedEntity recognition as a TokenFilter
> We are also planning a Tokenizer/TokenFilter that can put parts of speech as
> either payloads (PartOfSpeechAttribute?) on a token or at the same position.
> I'd propose it go under:
> modules/analysis/opennlp
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]