[jira] Commented: (LUCENE-1614) Add next() and skipTo() variants to DocIdSetIterator that return the current doc, instead of boolean

Shai Erera (JIRA) Wed, 27 May 2009 13:42:20 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713723#action_12713723
 ]


Shai Erera commented on LUCENE-1614:
------------------------------------

Regarding the nocommit in ConjunctionScorer's ctor - I think the problem with 
moving doNext() to the end of the ctor is the order of the scorers that will be 
iterated on. I.e., currently the scorers are first advanced to their first doc, 
then sorted based on their docID, then advanced to the doc ID that all agree 
on, in the order of the sort.

If you move doNext() to the end of the method, then the scorers are sorted 
based on their docID, and then immediately re-sorted (based on their 
sparseness, which is a heuristic applied based on their first docID).

If I understand correctly, the problem with calling doNext() after the re-sort 
is this: assume you have 5 scorers, whose first docs are 1, 2, 3, 5, 5 
respectively. Sorting leaves the array as is (or changes it to be in that 
order). If you call doNext() after array.sort, then you advance all the first 
scorers to 5 (or a larger doc ID they all agree on). However, if you do the 
re-sort, the order will be 5, 3, 2, 1, 5 and then if you call doNext(), it will 
stop immediately, since the first scorer's docs equals the last one. So you 
break an invariant, that after calling doNext() all scorers are on the same doc 
ID.

That's at least how I understand it. I hope I didn't write too much to explain 
the obvious (to others).

So I'll delete the nocommit tag.

Regarding nocommit in ReqExclScorer, you're right. I'll remove firstTime 
entirely.

bq. Can you add an "assert scorer.docID() == -1" in IndexSearcher right when it 
gets the scorer?

Done.

bq. BS2.nextDoc is still needing to check if it's supposed to call 
initCountingSumScorer? Can we do this in ctor?

It can, if I delete add() and do all the work in the ctor. I'll need to pass 3 
arrays (Scorer[], boolean[] /*required*/, boolean[] /*prohibited*/) though, or 
pass the list of Weights, Clauses and IndexReader. I'm not too fan of either. I 
check which is the lesser evil.

bq. BooleanScorer2.score(Collector, int) makes me a bit nervous

I didn't think it's such a big deal since it's one 'if' before scoring starts. 
It's not like it's called for every score(). Do you think we should resolve it?

bq. Hmm... maybe we permit DISI.docID() to also return NO_MORE_DOCS after the 
DISI is created? EG we do this in ConjunctionScorer. (Or, maybe, any docID 
that's < the first real docID...hmmm.)

I do not fully follow you. By allowing to return -1, we already allow any DISI 
to return a doc ID that's < the first real doc ID. And allowing to return 
NO_MORE_DOCS looks strange to me .. I mean imagine a code which creates a DISI 
and calls doc() and gets back NO_MORE_DOCS .. that'd be strange, won't it?

> Add next() and skipTo() variants to DocIdSetIterator that return the current 
> doc, instead of boolean
> ----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1614
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1614
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Shai Erera
>             Fix For: 2.9
>
>         Attachments: LUCENE-1614.patch, LUCENE-1614.patch, LUCENE-1614.patch, 
> LUCENE-1614.patch, LUCENE-1614.patch, LUCENE-1614.patch
>
>
> See 
> http://www.nabble.com/Another-possible-optimization---now-in-DocIdSetIterator-p23223319.html
>  for the full discussion. The basic idea is to add variants to those two 
> methods that return the current doc they are at, to save successive calls to 
> doc(). If there are no more docs, return -1. A summary of what was discussed 
> so far:
> # Deprecate those two methods.
> # Add nextDoc() and skipToDoc(int) that return doc, with default impl in DISI 
> (calls next() and skipTo() respectively, and will be changed to abstract in 
> 3.0).
> #* I actually would like to propose an alternative to the names: advance() 
> and advance(int) - the first advances by one, the second advances to target.
> # Wherever these are used, do something like '(doc = advance()) >= 0' instead 
> of comparing to -1 for improved performance.
> I will post a patch shortly

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1614) Add next() and skipTo() variants to DocIdSetIterator that return the current doc, instead of boolean

Reply via email to