[jira] Commented: (LUCENE-1614) Add next() and skipTo() variants to DocIdSetIterator that return the current doc, instead of boolean

Michael McCandless (JIRA) Wed, 27 May 2009 15:47:21 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713732#action_12713732
 ]


Michael McCandless commented on LUCENE-1614:
--------------------------------------------

{quote}
Regarding the nocommit in ConjunctionScorer's ctor - I think the problem with 
moving doNext() to the end of the ctor is the order of the scorers that will be 
iterated on. I.e., currently the scorers are first advanced to their first doc, 
then sorted based on their docID, then advanced to the doc ID that all agree 
on, in the order of the sort.
If you move doNext() to the end of the method, then the scorers are sorted 
based on their docID, and then immediately re-sorted (based on their 
sparseness, which is a heuristic applied based on their first docID).

If I understand correctly, the problem with calling doNext() after the re-sort 
is this: assume you have 5 scorers, whose first docs are 1, 2, 3, 5, 5 
respectively. Sorting leaves the array as is (or changes it to be in that 
order). If you call doNext() after array.sort, then you advance all the first 
scorers to 5 (or a larger doc ID they all agree on). However, if you do the 
re-sort, the order will be 5, 3, 2, 1, 5 and then if you call doNext(), it will 
stop immediately, since the first scorer's docs equals the last one. So you 
break an invariant, that after calling doNext() all scorers are on the same doc 
ID.

That's at least how I understand it. I hope I didn't write too much to explain 
the obvious (to others).

So I'll delete the nocommit tag.
{quote}

Ahh -- good explanation!  Phew.  This must be what leads to the two
test failures.  So leave it where it is (maybe add a comment
explaining it cannot be moved down, lest anyone else gets tempted).

{quote}
> Can you add an "assert scorer.docID() == -1" in IndexSearcher right when it 
> gets the scorer?

Done.
{quote}

OK, except we may delete it depending on the relaxing (below)...

{quote}
> BS2.nextDoc is still needing to check if it's supposed to call 
> initCountingSumScorer? Can we do this in ctor?

It can, if I delete add() and do all the work in the ctor.
{quote}

I think we should (lacking a DISI.start).

bq. I'll need to pass 3 arrays (Scorer[], boolean[] /required/, boolean[] 
/prohibited/) though, or pass the list of Weights, Clauses and IndexReader. I'm 
not too fan of either. I check which is the lesser evil.

Hmm -- I don't have a stronger lesser-of-evils preference.  I suppose,
instead, we could add a "start" only to BS2, which BQ would call once
it's done adding?  Or, simply call initCountingSumScorer from BQ and
don't do the check per nextDoc/advance.

{quote}
> BooleanScorer2.score(Collector, int) makes me a bit nervous

I didn't think it's such a big deal since it's one 'if' before scoring starts. 
It's not like it's called for every score(). Do you think we should resolve it?
{quote}

What specifically made me nervous that it was even necessary to add
this (having to conditionally next wasn't needed before) in the first
place.  If you remove it, does something actually break?  Like what
caused it to be added?  (Because I want to go explore/understand
*that* cause).

{quote}
> Hmm... maybe we permit DISI.docID() to also return NO_MORE_DOCS after the 
> DISI is created? EG we do this in ConjunctionScorer. (Or, maybe, any docID 
> that's < the first real docID...hmmm.)

I do not fully follow you. By allowing to return -1, we already allow any DISI 
to return a doc ID that's < the first real doc ID. And allowing to return 
NO_MORE_DOCS looks strange to me .. I mean imagine a code which creates a DISI 
and calls doc() and gets back NO_MORE_DOCS .. that'd be strange, won't it?
{quote}

I'm thinking it's OK to call docID before next, and that all we
require is that call return a docID less than its first real docID.

And, if you know up-front you have no docs, returning NO_MORE_DOCS up
front is also OK.

Ie this is a relaxation over "you must return -1 before nextDoc has
been called".

(EG I think the last patch would return NO_MORE_DOCS from docID() in
ConjunctionScorer if it determines in ctor that no docs match).


> Add next() and skipTo() variants to DocIdSetIterator that return the current 
> doc, instead of boolean
> ----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1614
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1614
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Shai Erera
>             Fix For: 2.9
>
>         Attachments: LUCENE-1614.patch, LUCENE-1614.patch, LUCENE-1614.patch, 
> LUCENE-1614.patch, LUCENE-1614.patch, LUCENE-1614.patch
>
>
> See 
> http://www.nabble.com/Another-possible-optimization---now-in-DocIdSetIterator-p23223319.html
>  for the full discussion. The basic idea is to add variants to those two 
> methods that return the current doc they are at, to save successive calls to 
> doc(). If there are no more docs, return -1. A summary of what was discussed 
> so far:
> # Deprecate those two methods.
> # Add nextDoc() and skipToDoc(int) that return doc, with default impl in DISI 
> (calls next() and skipTo() respectively, and will be changed to abstract in 
> 3.0).
> #* I actually would like to propose an alternative to the names: advance() 
> and advance(int) - the first advances by one, the second advances to target.
> # Wherever these are used, do something like '(doc = advance()) >= 0' instead 
> of comparing to -1 for improved performance.
> I will post a patch shortly

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-1614) Add next() and skipTo() variants to DocIdSetIterator that return the current doc, instead of boolean

Reply via email to