Hi

I think we can make some optimization to DocIdSetIterator. Today, it defines
next() and skipTo(int) which return a boolean. I've checked the code and it
looks like almost always when these two are called, they are followed by a
call to doc().

I was thinking that if those two returned the doc Id they are at, instead of
boolean, that will save the call to doc(). Those that use these can:
* Compare doc to a NO_MORE_DOCS constant (set to -1), to understand there
are no more docs in this iterator.
* If skipTo() is called, compare the 'target' to the returned Id, and if
they are not the same, save it so that the next skipTo is requested, they
don't perform it if the returned Id is greater than the target. If it's not
possible to save it, they can call doc() to get that information.

The way I see it, the impls that will still need to call doc() will lose
nothing. All we'll do is change the 'if' from comparing a boolean to
comparing ints (even though that's a bit less efficient than comparing
booleans). The impls that call doc() just because all they have in hand is a
boolean, will gain.

Obviously we can't change those methods' signature, so we can deprecate them
and intorudce nextDoc() and skipToDoc(int target). We should still keep
doc() around though.

What do you think? If you agree to this change, I can add them to 1593, or
create a new issue (I prefer the latter so that 1593 will be focused on the
changes to Collectors).

Shai

Reply via email to