On Wed, Jul 05, 2006, Paul Elschot wrote about "Re: Flexible index format / Payloads Cont'd": > > Ok, then, I thought to myself - the normal queries and scorers only work > > on the document level and don't use positions - but SpanQueries have > positions > > so I can create some sort of ProximityBooleanSpanQuery, right? Well, > > unfortunately, I couldn't figure out how, yet. SpanScorer is a Scorer >... > > SpanQueries can be nested because they pass around > Spans to higher levels for scoring at the top level of the proximity.
Ok, I've started writing a class I call ProximityBooleanQuery, which unlike BooleanQuery will need to contain SpanQueries, not just any Query, as clauses. My idea is that ProximityBooleanQuery's Scorer will sum the scores of the individual clauses (just like in BooleanQuery), but further increase the score depending on how matches we find nearby in the same document (to figure this out, I will use the Spans contained in the subdocuments). One of the peculiar things I noticed while experimenting with this approach is that SpanTermQuery's scorer is different from the regular TermQuery's scorer - its scores always appear multiplied by 1/sqrt(2) compared to TermQuery's scores. Is this deliberate? If not, should it perhaps be fixed? > So a minimum form of "ProximityBooleanSpanQuery" is already there > in Lucene. It is implemented by using a SpanScorer as a subscorer > of a BooleanScorer2, and by having this SpansScorer use the proximity > information passed up from the bottom level SpanTermQueries, normally > via some other SpanQuery like SpanNearQuery. I'm not sure I understand what you mean. Perhaps you mean something like the simple solution I described in a previous mail, where I added to a normal BooleanQuery several additional SpanNearQueries, one for each pair of terms in the query. This solution works quite well, but I thought it is inefficient which is why I was looking to come up with a more basic solution. > It might be possible to subclass Scorer to incorporate more position info, > but SpanQueries have a slightly different take, they use Spans to pass > the position info around. > This is also the reason why Lucene has some difficulty in weighting > the subqueries of a SpanQuery: unlike a Scorer, a Spans does not have > a score or weight value, and SpanScorer is used to provide the score, but > only at the top level of the proximity structure. > This could be changed adding a weight to Spans, or by adding some > form of position info to (a subclass of) Scorer. Yes, I think you described the situation well. At this stage, I'll continue to try to develop this feature using Lucene's existing Spans/SpanQuery framework. I hope this is possible, because the ideas you raised (adding weight to Spans or spans to Scorer) will require significant changes to many of Lucene's existing query types, or duplication of these query types, something which I'd rather avoid if possible. -- Nadav Har'El | Thursday, Jul 6 2006, 10 Tammuz 5766 IBM Haifa Research Lab |----------------------------------------- |Cement mixer collided with a prison van. http://nadav.harel.org.il |Look out for sixteen hardened criminals. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]