[ https://issues.apache.org/jira/browse/LUCENE-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12710851#action_12710851 ]
Michael McCandless commented on LUCENE-1614: -------------------------------------------- {quote} > This would save CPU for scorers that merge multiple sub-scorers (like > BooleanScorer/2), because instead of having to check for -1 returned from > each sub-scorer, they could simply proceed with their normal logic and check > for Integer.MAX_VALUE just before collecting the doc. But for scorers that use a priority queue, does checking and immediately removing from the queue (hence making the heap smaller) offer any advantages? I had assumed so since this is what current scorers do. Immediately removing scorers also causes early termination for minimumNrMatchers>1 in DisjunctionSumScorer. {quote} But that only helps at the tail end of the iteration, vs saving an if check per-sub-scorer X per-next? Ie presumably much more CPU is spent iterating while the PQ is full, than while it's winding down, so saving the if per-sub-scorer-next is better? Also, I think over time we should migrate away from the PQ (ie, use BooleanScorer's batch approach, not Disjunction*Scorer's PQ) since the batch scoring approach gives better performance. EG I think we should extend BooleanScorer to handle MUST clauses. BooleanScorer handles doc=Integer.MAX_VALUE for a sub-scorer quite efficiently (the chunk is always skipped for that sub-scorer, after one if check). > Add next() and skipTo() variants to DocIdSetIterator that return the current > doc, instead of boolean > ---------------------------------------------------------------------------------------------------- > > Key: LUCENE-1614 > URL: https://issues.apache.org/jira/browse/LUCENE-1614 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Reporter: Shai Erera > Fix For: 2.9 > > Attachments: LUCENE-1614.patch > > > See > http://www.nabble.com/Another-possible-optimization---now-in-DocIdSetIterator-p23223319.html > for the full discussion. The basic idea is to add variants to those two > methods that return the current doc they are at, to save successive calls to > doc(). If there are no more docs, return -1. A summary of what was discussed > so far: > # Deprecate those two methods. > # Add nextDoc() and skipToDoc(int) that return doc, with default impl in DISI > (calls next() and skipTo() respectively, and will be changed to abstract in > 3.0). > #* I actually would like to propose an alternative to the names: advance() > and advance(int) - the first advances by one, the second advances to target. > # Wherever these are used, do something like '(doc = advance()) >= 0' instead > of comparing to -1 for improved performance. > I will post a patch shortly -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org