On Mar 18, 2009, at 7:57 AM, Michael McCandless wrote:
Coming from the discussions in LUCENE-1522 (improving highlighter), I think at some point we should merge Span*Query into their normal counterparts, if possible. Ie, there should be only one TermQuery that can do both what the current TermQuery does, and also what SpanTermQuery does. It's able to enumerate the spans/payloads for a given document, and if you don't request those, the performance should hopefully be equal to that of the current TermQuery. The highligher would in fact request spans for a "normal" TermQuery, on a single doc index at a time, in order to locate the hits. Likewise for SpanOrQuery, SpanAndQuery. I have no real sense of how much work this is, what problems would ensue (eg possible difference in scoring, etc.), but from highlighter's standpoint, ideally all queries need to be able to enumerate the collection of positions that established the match.
Maybe they should all implement a common Interface that provides highlighting info? I don't know what it would be, but it seems easier to do that then to merge them all, but I'm not sure. Not that I wouldn't want to see a simpler query system. There's some cool things you can do w/ spans, but they still have some fundamental flaws that make them annoying. Namely, often times one of the reasons you want Spans is b/c you care about what is going on around the match, i.e. co-occurrence data, yet it is still annoying/difficult to get that information w/o pivoting around either term vectors or re analyzing the document. With the new Attribute stuff, however, it might be getting a little easier, as one could now store offset information at the term level (which you can do w/ payloads, too) and then use that to index into the original String.
--------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org