Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

Glen Newton Thu, 13 Dec 2012 13:54:51 -0800

It is not clear this is exactly what is needed/being discussed.

>From the issue:
"We are also planning a Tokenizer/TokenFilter that can put parts of
speech as either payloads (PartOfSpeechAttribute?) on a token or at
the same position."


This adds it to a token, not a span. 'same position' does not suggest
it also records the end position.

-Glen

On Thu, Dec 13, 2012 at 4:45 PM, Lance Norskog <[email protected]> wrote:
> Parts-of-speech is available now, in the indexer.
>
> LUCENE-2899 adds OpenNLP to the Lucene&Solr codebase. It does
> parts-of-speech, chunking and Named Entity Recognition. OpenNLP is an Apache
> project for natural-language processing.
>
> Some parts are in Solr that could be in Lucene.
>
> https://issues.apache.org/jira/browse/lucene-2899
>
>
> On 12/12/2012 12:02 PM, Wu, Stephen T., Ph.D. wrote:
>>>>
>>>> Is there any (preliminary) code checked in somewhere that I can look at,
>>>> that would help me understand the practical issues that would need to be
>>>> addressed?
>>>
>>> Maybe we can make this more concrete: what new attribute are you
>>> needing to record in the postings and access at search time?
>>
>> For example:
>>   - part of speech of a token.
>>   - syntactic parse subtree (over a span).
>>   - semantically normalized phrase (to canonical text or ontological
>> code).
>>   - semantic group (of a span).
>>   - coreference link.
>>
>> stephen
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>



-- 
-
http://zzzoot.blogspot.com/
-

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

Reply via email to