On Mar 18, 2009, at 7:57 AM, Michael McCandless wrote:


Coming from the discussions in LUCENE-1522 (improving highlighter), I
think at some point we should merge Span*Query into their normal
counterparts, if possible.

Ie, there should be only one TermQuery that can do both what the
current TermQuery does, and also what SpanTermQuery does.  It's able
to enumerate the spans/payloads for a given document, and if you don't
request those, the performance should hopefully be equal to that of
the current TermQuery.

The highligher would in fact request spans for a "normal" TermQuery,
on a single doc index at a time, in order to locate the hits.

Likewise for SpanOrQuery, SpanAndQuery.

I have no real sense of how much work this is, what problems would
ensue (eg possible difference in scoring, etc.), but from
highlighter's standpoint, ideally all queries need to be able to
enumerate the collection of positions that established the match.

Maybe they should all implement a common Interface that provides highlighting info? I don't know what it would be, but it seems easier to do that then to merge them all, but I'm not sure. Not that I wouldn't want to see a simpler query system. There's some cool things you can do w/ spans, but they still have some fundamental flaws that make them annoying. Namely, often times one of the reasons you want Spans is b/c you care about what is going on around the match, i.e. co-occurrence data, yet it is still annoying/difficult to get that information w/o pivoting around either term vectors or re analyzing the document. With the new Attribute stuff, however, it might be getting a little easier, as one could now store offset information at the term level (which you can do w/ payloads, too) and then use that to index into the original String.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to