I would have assumed the many int comparisons would cost less than the 
superfluous disk accesses? (I bow to your considerable experience in this area!)
What is the worst-case scenario on added disk reads? Could it be as bad 
as numberOfSegments x numberOfOtherscorers before the query winds up?
On the index I tried, it looked like an improvement - the spreadsheet I linked 
to has the source for the benchmark on a second worksheet if you want to give 
it a whirl on a different dataset.



----- Original Message -----
From: Michael McCandless <[email protected]>
To: [email protected]; mark harwood <[email protected]>
Cc: 
Sent: Thursday, 1 March 2012, 13:31
Subject: Re: ConjunctionScorer.doNext() overstays?

Hmm, the tradeoff is an added per-hit check (doc != NO_MORE_DOCS), vs
the one-time cost at the end of calling advance(NO_MORE_DOCS) for each
sub-clause?  I think in general this isn't a good tradeoff?

Ie what about the case where we and high-freq, and similarly freq'd,
terms together?  Then, the per-hit check will at some point dominate?

It's valid to pass NO_MORE_DOCS to DocsEnum.advance.

Mike McCandless

http://blog.mikemccandless.com

On Thu, Mar 1, 2012 at 7:22 AM, mark harwood <[email protected]> wrote:
> I got round to some benchmarking of this change on Wikipedia content which 
> shows a small improvement:   http://goo.gl/60wJG
>
> Aside from the small performance gain to be had, it just feels more logical 
> if ConjunctionScorer does not issue sub scorers with a request to advance to 
> "NO_MORE_DOCS".
>
>
>
>
> ----- Original Message -----
> From: mark harwood <[email protected]>
> To: "[email protected]" <[email protected]>
> Cc:
> Sent: Thursday, 1 March 2012, 9:39
> Subject: ConjunctionScorer.doNext() overstays?
>
> Due to the odd behaviour of a custom Scorer of mine I discovered 
> ConjunctionScorer.doNext() could loop indefinitely.
> It does not bail out as soon as any scorer.advance() call it makes reports 
> back "NO_MORE_DOCS". Is there not a performance optimisation to be gained in 
> exiting as soon as this happens?
> At this stage I cannot see any point in continuing to advance other scorers - 
> a quick look at TermScorer suggests that any questionable calls made by 
> ConjunctionScorer to advance to NO_MORE_DOCS receives no special treatment 
> and disk will be hit as a consequence.
> I added an extra condition to the while loop on the 3.5 source:
>
>     while ((doc != NO_MORE_DOCS)  && ((firstScorer = scorers[first]).docID() 
> < doc)) {
>
> and Junit tests passed.I haven't been able to benchmark performance 
> improvements but it looks like it would be sensible to make the change anyway.
>
> Cheers,
> Mark
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to