On 1/3/2013 6:16 PM, Wu, Stephen T., Ph.D. wrote:
I think we've been saying that if we put something in a Payload, it will be
indexed. From what I understand of the indexing format, that means that
what you put in the Payload will be stored in the Lucene index... But it
won't *itself* be indexed & optimized for search.
That's good, but can we build inverted indices on the contents of the
Payloads (or the Attributes) as well?
Ex1: Say I put semantic role labels like ARG0 into my index. Say my search
is looking for all instances of ARG0.
Ex2: Say I add payloads to terms indicating that they're named entities
belonging to a semantic group. Then say my query looks for all instances of
the "Medications" semantic group.
It's almost like just putting these things in different fields, with the
exception that the things in different fields need to be linked so you know
what the original text was. Maybe the linking can be done via Payloads
(offsets in the original text)? If I want to store multiple things at the
same startOffset then I just use something like SynonymFilter?
I've been working on a different but (in a way) related problem:
indexing text in XML documents. In that case, we want to associate the
names of enclosing elements with each term so that it's possible to
search for (say) "ermine" in the context /doc/title as distinct from
"ermine" in the context of //paragraph, or something like that. Anyway
what I've done doesn't use payloads. I index two fields that are
relevant to this: a full text field, which is just the usual text index
(per document), and then an element-text field which indexes each term
as a concatenation of the element name and the term value, so:
title:ermine, doc:ermine, and paragraph:ermine would be typical terms.
I index all of the enclosing element names for each word at the same
position (like synonym filter does). This relies on a magical character
(":") that isn't allowed to appear in any tokens, which is too bad, but
not terribly restrictive.
Something like this might work for you. The prefixing also has the nice
feature that when you enumerate terms, they are ordered first by prefix:
of course you could flip the order if it were more interesting to list
all "contexts" for a word rather than all words in a context (or with
some POS tag).
-Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org