[
https://issues.apache.org/jira/browse/LUCENE-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711533#action_12711533
]
Shai Erera commented on LUCENE-1614:
------------------------------------
Hmmm ... there is a problem with MAX_VAL as sentinel:
# OpenBitSetIterator already has a nextDoc() method which returns -1 when
exhausted. The current patch used -1 as sentinel, and therefore it was ok. But
when I changed it to MAX_VAL, test-tag fail on TestOpenBitSet. We can document
this change in "changes to back-compat policy" and fix the test on trunk and
tag.
# SortedVIntList returns an iterator which is a DISI. MAX_VAL is considered a
valid value for SortedVIntList (TestSortedVIntList.test03() validates that). So
if we use it as sentinel, we declare that MAX_VAL is invalid for
SortedVIntList. Not sure if we can do that.
I think the second is the problematic one - how do we handle iterators for
which MAX_VAL is a valid value? I tend to say MAX_VAL should not be a valid
value since by the name, *DocId*SetIterator, we should return doc Ids, and
MAX_VAL is used as sentinel elsewhere and is not even considered a valid doc
Id. Therefore those iterators should change their logic, if they rely on
MAX_VAL being a valid value.
BTW, besides the convenience, why should SortedVIntList expose a
DocIdSetIterator? Nothing implies the list will hold doc Ids, therefore why
commit to an iterator which returns doc Ids? If it's for convenience only, then
maybe wrap that iterator with a true DISI where the Lucene code will need a
true DISI?
What do you think?
> Add next() and skipTo() variants to DocIdSetIterator that return the current
> doc, instead of boolean
> ----------------------------------------------------------------------------------------------------
>
> Key: LUCENE-1614
> URL: https://issues.apache.org/jira/browse/LUCENE-1614
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Reporter: Shai Erera
> Fix For: 2.9
>
> Attachments: LUCENE-1614.patch
>
>
> See
> http://www.nabble.com/Another-possible-optimization---now-in-DocIdSetIterator-p23223319.html
> for the full discussion. The basic idea is to add variants to those two
> methods that return the current doc they are at, to save successive calls to
> doc(). If there are no more docs, return -1. A summary of what was discussed
> so far:
> # Deprecate those two methods.
> # Add nextDoc() and skipToDoc(int) that return doc, with default impl in DISI
> (calls next() and skipTo() respectively, and will be changed to abstract in
> 3.0).
> #* I actually would like to propose an alternative to the names: advance()
> and advance(int) - the first advances by one, the second advances to target.
> # Wherever these are used, do something like '(doc = advance()) >= 0' instead
> of comparing to -1 for improved performance.
> I will post a patch shortly
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]