[ https://issues.apache.org/jira/browse/LUCENE-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12710880#action_12710880 ]
Michael McCandless commented on LUCENE-1614: -------------------------------------------- bq. About using Integer.MAX_VALUE as sentinel, did anyone consider what happens when the first index actually reaches that number of documents? Lucene already uses Integer.MAX_VALUE as a sentinel (eg the score(Collector) methods in Term/BooleanScorer/2), so a Lucene index can already only contain Integer.MAX_VALUE docs. bq. On moving from the priority queue (DisjunctionSumScorer/BooleanScorer2) to the batch approach (BooleanScorer): I did not find a way to do that while scoring docs in docId order. What breaks if we allow docs to be collected out-of-order (besides external Hit/Collector)? As of LUCENE-1575, the core collectors can gain performance if they know the docs will be collected in order, but they can also handle out-or-order collection just fine. bq. The priority queue can be made faster by inlining (there is a patch for that, I can't get to the issue number now), but that's about the limit as far as I can see. I think PQ is fundamentally not very friendly to modern CPUs, because of the hard-to-predict ifs; I think that's part of why the batch collection shows such gains. This doesn't hurt us so much during hit collection, which also uses PQ, since the queue typically quickly converges, but for OR scoring the PQ is intensely used the whole time. > Add next() and skipTo() variants to DocIdSetIterator that return the current > doc, instead of boolean > ---------------------------------------------------------------------------------------------------- > > Key: LUCENE-1614 > URL: https://issues.apache.org/jira/browse/LUCENE-1614 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Reporter: Shai Erera > Fix For: 2.9 > > Attachments: LUCENE-1614.patch > > > See > http://www.nabble.com/Another-possible-optimization---now-in-DocIdSetIterator-p23223319.html > for the full discussion. The basic idea is to add variants to those two > methods that return the current doc they are at, to save successive calls to > doc(). If there are no more docs, return -1. A summary of what was discussed > so far: > # Deprecate those two methods. > # Add nextDoc() and skipToDoc(int) that return doc, with default impl in DISI > (calls next() and skipTo() respectively, and will be changed to abstract in > 3.0). > #* I actually would like to propose an alternative to the names: advance() > and advance(int) - the first advances by one, the second advances to target. > # Wherever these are used, do something like '(doc = advance()) >= 0' instead > of comparing to -1 for improved performance. > I will post a patch shortly -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org