I would have assumed the many int comparisons would cost less than the superfluous disk accesses? (I bow to your considerable experience in this area!) What is the worst-case scenario on added disk reads? Could it be as bad as numberOfSegments x numberOfOtherscorers before the query winds up? On the index I tried, it looked like an improvement - the spreadsheet I linked to has the source for the benchmark on a second worksheet if you want to give it a whirl on a different dataset.
----- Original Message ----- From: Michael McCandless <[email protected]> To: [email protected]; mark harwood <[email protected]> Cc: Sent: Thursday, 1 March 2012, 13:31 Subject: Re: ConjunctionScorer.doNext() overstays? Hmm, the tradeoff is an added per-hit check (doc != NO_MORE_DOCS), vs the one-time cost at the end of calling advance(NO_MORE_DOCS) for each sub-clause? I think in general this isn't a good tradeoff? Ie what about the case where we and high-freq, and similarly freq'd, terms together? Then, the per-hit check will at some point dominate? It's valid to pass NO_MORE_DOCS to DocsEnum.advance. Mike McCandless http://blog.mikemccandless.com On Thu, Mar 1, 2012 at 7:22 AM, mark harwood <[email protected]> wrote: > I got round to some benchmarking of this change on Wikipedia content which > shows a small improvement: http://goo.gl/60wJG > > Aside from the small performance gain to be had, it just feels more logical > if ConjunctionScorer does not issue sub scorers with a request to advance to > "NO_MORE_DOCS". > > > > > ----- Original Message ----- > From: mark harwood <[email protected]> > To: "[email protected]" <[email protected]> > Cc: > Sent: Thursday, 1 March 2012, 9:39 > Subject: ConjunctionScorer.doNext() overstays? > > Due to the odd behaviour of a custom Scorer of mine I discovered > ConjunctionScorer.doNext() could loop indefinitely. > It does not bail out as soon as any scorer.advance() call it makes reports > back "NO_MORE_DOCS". Is there not a performance optimisation to be gained in > exiting as soon as this happens? > At this stage I cannot see any point in continuing to advance other scorers - > a quick look at TermScorer suggests that any questionable calls made by > ConjunctionScorer to advance to NO_MORE_DOCS receives no special treatment > and disk will be hit as a consequence. > I added an extra condition to the while loop on the 3.5 source: > > while ((doc != NO_MORE_DOCS) && ((firstScorer = scorers[first]).docID() > < doc)) { > > and Junit tests passed.I haven't been able to benchmark performance > improvements but it looks like it would be sensible to make the change anyway. > > Cheers, > Mark > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
