I should not have added that note. The Opennlp patch gives a concrete example of adding an annotation to text.

On 12/13/2012 01:54 PM, Glen Newton wrote:
It is not clear this is exactly what is needed/being discussed.

 From the issue:
"We are also planning a Tokenizer/TokenFilter that can put parts of
speech as either payloads (PartOfSpeechAttribute?) on a token or at
the same position."

This adds it to a token, not a span. 'same position' does not suggest
it also records the end position.

-Glen

On Thu, Dec 13, 2012 at 4:45 PM, Lance Norskog <goks...@gmail.com> wrote:
Parts-of-speech is available now, in the indexer.

LUCENE-2899 adds OpenNLP to the Lucene&Solr codebase. It does
parts-of-speech, chunking and Named Entity Recognition. OpenNLP is an Apache
project for natural-language processing.

Some parts are in Solr that could be in Lucene.

https://issues.apache.org/jira/browse/lucene-2899


On 12/12/2012 12:02 PM, Wu, Stephen T., Ph.D. wrote:
Is there any (preliminary) code checked in somewhere that I can look at,
that would help me understand the practical issues that would need to be
addressed?
Maybe we can make this more concrete: what new attribute are you
needing to record in the postings and access at search time?
For example:
   - part of speech of a token.
   - syntactic parse subtree (over a span).
   - semantically normalized phrase (to canonical text or ontological
code).
   - semantic group (of a span).
   - coreference link.

stephen


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to