[jira] Commented: (LUCENE-1614) Add next() and skipTo() variants to DocIdSetIterator that return the current doc, instead of boolean

Michael McCandless (JIRA) Tue, 19 May 2009 13:52:19 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12710880#action_12710880
 ]


Michael McCandless commented on LUCENE-1614:
--------------------------------------------

bq. About using Integer.MAX_VALUE as sentinel, did anyone consider what happens 
when the first index actually reaches that number of documents?

Lucene already uses Integer.MAX_VALUE as a sentinel (eg the score(Collector) 
methods in Term/BooleanScorer/2), so a Lucene index can already only contain 
Integer.MAX_VALUE docs.

bq. On moving from the priority queue (DisjunctionSumScorer/BooleanScorer2) to 
the batch approach (BooleanScorer): I did not find a way to do that while 
scoring docs in docId order. 

What breaks if we allow docs to be collected out-of-order (besides external 
Hit/Collector)?  As of LUCENE-1575, the core collectors can gain performance if 
they know the docs will be collected in order, but they can also handle 
out-or-order collection just fine.

bq. The priority queue can be made faster by inlining (there is a patch for 
that, I can't get to the issue number now), but that's about the limit as far 
as I can see.

I think PQ is fundamentally not very friendly to modern CPUs, because of the 
hard-to-predict ifs; I think that's part of why the batch collection shows such 
gains.

This doesn't hurt us so much during hit collection, which also uses PQ, since 
the queue typically quickly converges, but for OR scoring the PQ is intensely 
used the whole time.


> Add next() and skipTo() variants to DocIdSetIterator that return the current 
> doc, instead of boolean
> ----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1614
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1614
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Shai Erera
>             Fix For: 2.9
>
>         Attachments: LUCENE-1614.patch
>
>
> See 
> http://www.nabble.com/Another-possible-optimization---now-in-DocIdSetIterator-p23223319.html
>  for the full discussion. The basic idea is to add variants to those two 
> methods that return the current doc they are at, to save successive calls to 
> doc(). If there are no more docs, return -1. A summary of what was discussed 
> so far:
> # Deprecate those two methods.
> # Add nextDoc() and skipToDoc(int) that return doc, with default impl in DISI 
> (calls next() and skipTo() respectively, and will be changed to abstract in 
> 3.0).
> #* I actually would like to propose an alternative to the names: advance() 
> and advance(int) - the first advances by one, the second advances to target.
> # Wherever these are used, do something like '(doc = advance()) >= 0' instead 
> of comparing to -1 for improved performance.
> I will post a patch shortly

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-1614) Add next() and skipTo() variants to DocIdSetIterator that return the current doc, instead of boolean

Reply via email to