[jira] [Commented] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module

Steve Rowe (JIRA) Mon, 11 Jul 2016 15:57:48 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15371843#comment-15371843
 ]


Steve Rowe commented on LUCENE-2899:
------------------------------------

bq. Steve Rowe per our conversation yesterday.. Would be interesting to store 
the PoS and entity information as stacked tokens vs (or in addition to the) 
payload... such that you could do "bob @person"~0 or "house @verb"~0 type 
queries.. or things like "@person @ceo"~10

[~sbower], I agree, that possibility would be nice.  I checked for the 
existence of a token type->synonym filter, and don't see one, but I think it 
would be fairly easy to add.

Which reminds me: the lemmatization filter I added here should have the ability 
(like some stemmers, indirectly) to emit lemmas as synonyms - this is possible, 
as in the PorterStemmer implementaiton, simply by not processing any tokens 
with the keyword attribute set to true, and preceding with the 
KeywordRepeatFilter. 

> Add OpenNLP Analysis capabilities as a module
> ---------------------------------------------
>
>                 Key: LUCENE-2899
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2899
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/analysis
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 4.9, 6.0
>
>         Attachments: LUCENE-2899-6.1.0.patch, LUCENE-2899-RJN.patch, 
> LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, OpenNLPFilter.java, 
> OpenNLPTokenizer.java
>
>
> Now that OpenNLP is an ASF project and has a nice license, it would be nice 
> to have a submodule (under analysis) that exposed capabilities for it. Drew 
> Farris, Tom Morton and I have code that does:
> * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it 
> would have to change slightly to buffer tokens)
> * NamedEntity recognition as a TokenFilter
> We are also planning a Tokenizer/TokenFilter that can put parts of speech as 
> either payloads (PartOfSpeechAttribute?) on a token or at the same position.
> I'd propose it go under:
> modules/analysis/opennlp



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module

Reply via email to