[ 
https://issues.apache.org/jira/browse/LUCENE-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704556#action_12704556
 ] 

Michael McCandless commented on LUCENE-1614:
--------------------------------------------

bq. Just to clarify for myself, in the example I gave above, suppose thar the 
scorer is on "3" and you call check(8).

On check(8), TermScorer would go to 10, stop there, and return false.  (It 
would not "rewind" to 3).  Check can only be called on increasing arguments, so 
it's not truly "random access".  It's "forward only random access".

bq. You propose this check() so that in case a DISI can save any extra 
operations it does in next() (such as reading a payload for example) it will do 
so. Therefore in the example you give above with CS, next()'s contract forces 
it to advance all the sub-scorers, but with check() it could stop in the middle.

Precisely.

This is important when you have a super-cheap iterator (say a somewhat sparse 
(<=10%?) in-memory filter that's represented as list-of-docIDs).  It's very 
fast for such a filter to iterate over its docIDs.  But when that iterator is 
AND'd with a Scorer, as is done today by IndexSearcher, they effectively play 
"leap frog", where first it's the filter's turn to next(), then it's the 
Scorer's turn, etc.  But for the Scorer, next() can be extremely costly, only 
to find the filter doesn't accept it.  So for such situations it's better to 
let the filter drive the search, calling Scorer.check() on the docs.

But... once we switch to filter-as-BooleanClause, it's less clear whether 
check() is worthwhile, because I think the filter's constraint is more 
efficiently taken into account.

For filters that support random access (if they are less sparse, say >= 25% or 
so), we should push them all the way down to the TermScorers and factor them in 
just like deletedDocs.

bq. . If the default impl in DISI just uses nextDoc() and returns true if the 
return value is the requested, we should be safe back-compat-wise, but this is 
still dangerous and we need clear documentation.

Yes it does have a good default impl, I think.

bq. BTW, perhaps a testAndSet-like version can save check(10) followed by a 
next(10), and will fit nicer?

Not sure what you mean by "testAndSet-like version"?

> Add next() and skipTo() variants to DocIdSetIterator that return the current 
> doc, instead of boolean
> ----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1614
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1614
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Shai Erera
>             Fix For: 2.9
>
>
> See 
> http://www.nabble.com/Another-possible-optimization---now-in-DocIdSetIterator-p23223319.html
>  for the full discussion. The basic idea is to add variants to those two 
> methods that return the current doc they are at, to save successive calls to 
> doc(). If there are no more docs, return -1. A summary of what was discussed 
> so far:
> # Deprecate those two methods.
> # Add nextDoc() and skipToDoc(int) that return doc, with default impl in DISI 
> (calls next() and skipTo() respectively, and will be changed to abstract in 
> 3.0).
> #* I actually would like to propose an alternative to the names: advance() 
> and advance(int) - the first advances by one, the second advances to target.
> # Wherever these are used, do something like '(doc = advance()) >= 0' instead 
> of comparing to -1 for improved performance.
> I will post a patch shortly

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to