[EMAIL PROTECTED] wrote:
  Bug in PhraseScorer and ConjunctionScorer
  skipTo implementation fixed.

Christoph,


Thanks for finding fixing these. This reminds me of a couple of scorer-related optimizations that we should keep in mind:

1. As you've noticed, the logic of ConjunctionScorer and PhraseScorer (and SpansNear, for that matter) are very similar, except ConjunctionScorer uses java's collection classes, while PhraseScorer uses Lucene's PriorityQueue and a home-grown linked list. Because of this, PhraseScorer allocates no new objects while scoring, while ConjuctionScorer must allocate a number of objects each time skipTo() is called. ConjunctionScorer is still usually much faster than BooleanScorer, since it uses SkipTo, but it could be much faster yet. So someday we should probably re-write ConjunctionScorer to use a PriorityQueue, so that it allocates fewer objects while scoring.

2. BooleanScorer does not return hits in strict document-id order (arguably a bug). Because of this, ConjunctionScorer cannot be used when any of its sub-scorers use BooleanScorer, i.e., with sub-queries that contain optional or prohibited clauses. This is unfortunate, as there are probably lots of cases where a conjunction of disjunctions could be accellerated by skipTo() that we are not accellerating. One way to fix this is to re-write BooleanScorer so that it always returns hits in order, using an algorithm like that in SpanOrQuery.getSpans(). However, that would make large disjunctions somewhat slower, by a factor of log(n), where n is the number of clauses in the disjunction. I don't know whether this would really be significant. Another approach might be to write a version of BooleanScorer that returns hits in order, and only to use it when it is embedded in a conjunction. This is trickier to implement, and it would be best if BooleanScorer always returned hits in the correct order, so I lean slightly towards the first approach.

Cheers,

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to