It is not clear this is exactly what is needed/being discussed. >From the issue: "We are also planning a Tokenizer/TokenFilter that can put parts of speech as either payloads (PartOfSpeechAttribute?) on a token or at the same position."
This adds it to a token, not a span. 'same position' does not suggest it also records the end position. -Glen On Thu, Dec 13, 2012 at 4:45 PM, Lance Norskog <goks...@gmail.com> wrote: > Parts-of-speech is available now, in the indexer. > > LUCENE-2899 adds OpenNLP to the Lucene&Solr codebase. It does > parts-of-speech, chunking and Named Entity Recognition. OpenNLP is an Apache > project for natural-language processing. > > Some parts are in Solr that could be in Lucene. > > https://issues.apache.org/jira/browse/lucene-2899 > > > On 12/12/2012 12:02 PM, Wu, Stephen T., Ph.D. wrote: >>>> >>>> Is there any (preliminary) code checked in somewhere that I can look at, >>>> that would help me understand the practical issues that would need to be >>>> addressed? >>> >>> Maybe we can make this more concrete: what new attribute are you >>> needing to record in the postings and access at search time? >> >> For example: >> - part of speech of a token. >> - syntactic parse subtree (over a span). >> - semantically normalized phrase (to canonical text or ontological >> code). >> - semantic group (of a span). >> - coreference link. >> >> stephen >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > -- - http://zzzoot.blogspot.com/ - --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org