[jira] Commented: (LUCENE-1614) Add next() and skipTo() variants to DocIdSetIterator that return the current doc, instead of boolean

Michael McCandless (JIRA) Thu, 21 May 2009 09:40:17 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711682#action_12711682
 ]


Michael McCandless commented on LUCENE-1614:
--------------------------------------------

bq. If they are all on -1 to start with, they are already all sorted.

Right but that defeats the optimization.  I'm talking about this code in 
ConjunctionScorer:
{code}
Arrays.sort(scorers, new Comparator() {         // sort the array
    public int compare(Object o1, Object o2) {
      return ((Scorer)o1).doc() - ((Scorer)o2).doc();
    }
  });

doNext();

// If first-time skip distance is any predictor of
// scorer sparseness, then we should always try to skip first on
// those scorers.
// Keep last scorer in it's last place (it will be the first
// to be skipped on), but reverse all of the others so that
// they will be skipped on in order of original high skip.
int end=(scorers.length-1);
for (int i=0; i<(end>>1); i++) {
  Scorer tmp = scorers[i];
  scorers[i] = scorers[end-i-1];
  scorers[end-i-1] = tmp;
}
{code}

Ie it sets things up so that "typically" the rarest sub-scorer drives the 
intersection.  If they are all on -1 then this heuristic won't work.

{quote}
We could do some smart sorting in the constructor so that we skip in cheap and 
fast scorers first (TermScorers first, ordered by df, followed by simple 
conjunctions of terms, followed by other more expensive stuff like sloppy 
phrase queries and complex boolean queries. Perhaps in the future, even a 
method on Scorer that estimates it's cost?
{quote}

Right, we'd need to do something along these lines if we switch DISI to start 
with doc() = -1.

> Add next() and skipTo() variants to DocIdSetIterator that return the current 
> doc, instead of boolean
> ----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1614
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1614
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Shai Erera
>             Fix For: 2.9
>
>         Attachments: LUCENE-1614.patch, LUCENE-1614.patch
>
>
> See 
> http://www.nabble.com/Another-possible-optimization---now-in-DocIdSetIterator-p23223319.html
>  for the full discussion. The basic idea is to add variants to those two 
> methods that return the current doc they are at, to save successive calls to 
> doc(). If there are no more docs, return -1. A summary of what was discussed 
> so far:
> # Deprecate those two methods.
> # Add nextDoc() and skipToDoc(int) that return doc, with default impl in DISI 
> (calls next() and skipTo() respectively, and will be changed to abstract in 
> 3.0).
> #* I actually would like to propose an alternative to the names: advance() 
> and advance(int) - the first advances by one, the second advances to target.
> # Wherever these are used, do something like '(doc = advance()) >= 0' instead 
> of comparing to -1 for improved performance.
> I will post a patch shortly

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1614) Add next() and skipTo() variants to DocIdSetIterator that return the current doc, instead of boolean

Reply via email to