Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

Lance Norskog Thu, 13 Dec 2012 13:46:03 -0800

Parts-of-speech is available now, in the indexer.

LUCENE-2899 adds OpenNLP to the Lucene&Solr codebase. It doesparts-of-speech, chunking and Named Entity Recognition. OpenNLP is anApache project for natural-language processing.


Some parts are in Solr that could be in Lucene.

https://issues.apache.org/jira/browse/lucene-2899

On 12/12/2012 12:02 PM, Wu, Stephen T., Ph.D. wrote:

Is there any (preliminary) code checked in somewhere that I can look at,
that would help me understand the practical issues that would need to be
addressed?

Maybe we can make this more concrete: what new attribute are you
needing to record in the postings and access at search time?

For example:
  - part of speech of a token.
  - syntactic parse subtree (over a span).
  - semantically normalized phrase (to canonical text or ontological code).
  - semantic group (of a span).
  - coreference link.

stephen


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

Reply via email to