[jira] Commented: (LUCENE-1614) Add next() and skipTo() variants to DocIdSetIterator that return the current doc, instead of boolean

Shai Erera (JIRA) Tue, 26 May 2009 08:44:13 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713031#action_12713031
 ]


Shai Erera commented on LUCENE-1614:
------------------------------------

A question regarding BS.nextDoc(). I paste the current code (that I'm working 
on, for reference).

{code}
  public int nextDoc() throws IOException {
    // TODO: can remove more?
//    boolean more;
    do {
      while (bucketTable.first != null) {         // more queued
        current = bucketTable.first;
        bucketTable.first = current.next;         // pop the queue

        // check prohibited & required, and minNrShouldMatch
        if ((current.bits & prohibitedMask) == 0 &&
            (current.bits & requiredMask) == requiredMask &&
            current.coord >= minNrShouldMatch) {
          return doc = current.doc;
        }
      }

      // refill the queue
//      more = false;
      end += BucketTable.SIZE;
      for (SubScorer sub = scorers; sub != null; sub = sub.next) {
        Scorer scorer = sub.scorer;
        sub.collector.setScorer(scorer);
        int doc = scorer.docID();
        if (doc == -1) {
          doc = scorer.nextDoc();
        }
        while (doc < end) {
          sub.collector.collect(doc);
          doc = scorer.nextDoc();
        }
//        if (doc != NO_MORE_DOCS) {
//          more = true;
//        }
      }
    } while (bucketTable.first != null/* || more*/);

    return doc = NO_MORE_DOCS;
  }
{code}

I wanted to get rid of 'more', following all the changes I'm doing to DISI. I 
did it and all tests pass, but I want to double-check my understanding, since I 
don't know for sure if there is a test that tests my change.

As far as I see it, there are two code sections: (1) iterates on bucketTable 
until it is exhausted (i.e., calls to nextDoc() will first consume bucketTable) 
and (2) iterate on all sub scorers. The second iteration populates the bucket 
table from all sub-scorers and reiterates, consuming bucket table again, in 
calls to nextDoc(). Then, at some point, all sub scorers don't have anything 
more to collect, and bucketTable.first is null for the last time, at which 
point nextDoc() returns NO_MORE_DOCS.

So it looks like I can indeed get rid of 'more'. But what puzzles me is why it 
was there in the first place. Leaving the code as-is means that someone thought 
of a case where a sub-scorer will have more documents to collect, but still 
after the 2nd code segment the bucketTable was empty. That's why I'm not sure I 
can remove 'more' safely - i.e., is it possible that a sub-scorer will have 
more documents, however all the docs that were collected by sub.collector will 
not affect the bucketTable? It doesn't look like in the code.

Same question for BS.score(Collector collector, int max).

> Add next() and skipTo() variants to DocIdSetIterator that return the current 
> doc, instead of boolean
> ----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1614
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1614
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Shai Erera
>             Fix For: 2.9
>
>         Attachments: LUCENE-1614.patch, LUCENE-1614.patch, LUCENE-1614.patch
>
>
> See 
> http://www.nabble.com/Another-possible-optimization---now-in-DocIdSetIterator-p23223319.html
>  for the full discussion. The basic idea is to add variants to those two 
> methods that return the current doc they are at, to save successive calls to 
> doc(). If there are no more docs, return -1. A summary of what was discussed 
> so far:
> # Deprecate those two methods.
> # Add nextDoc() and skipToDoc(int) that return doc, with default impl in DISI 
> (calls next() and skipTo() respectively, and will be changed to abstract in 
> 3.0).
> #* I actually would like to propose an alternative to the names: advance() 
> and advance(int) - the first advances by one, the second advances to target.
> # Wherever these are used, do something like '(doc = advance()) >= 0' instead 
> of comparing to -1 for improved performance.
> I will post a patch shortly

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-1614) Add next() and skipTo() variants to DocIdSetIterator that return the current doc, instead of boolean

Reply via email to