[
https://issues.apache.org/jira/browse/LUCENE-5272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793637#comment-13793637
]
Shai Erera commented on LUCENE-5272:
------------------------------------
I was about to post an updated patch when I realized that now {{OBS.length()}}
is broken. It currently returns {{bits.length << 6}}, which is completely
unrelated to neither {{wlen}} nor {{numBits}}. Bits.length() javadocs say:
"Returns the number of bits in this set", what should we return?
* If we want to adhere to the jdocs, we should return {{numBits}}, but then
numBits cannot be an assertion-only member.
* If we want to return {{wlen << 6}}, then we should nuke {{numBits}} because
otherwise you will call {{length()}} to receive e.g 64 but when you will call
{{fastSet(52)}} you may trip the assertion error.
* If we return bits.length, then we can nuke both wlen and numBits.
Actually, OpenBitSetIterator is also broken, because it iterates up to wlen,
again with no relation to numBits. And of course it's now unrelated to
bits.length too, so e.g. if you call OBS.length() and compares that with the
number of nextDoc() calls the iterator returns (say when all bits are 'set'),
the numbers are not equal...
Why do we need these two fields? As far as I can tell, wlen is only <
bits.length when someone shares a long[] but limit its size. Do we really care
about this case? I can understand the reason to keep {{numBits}} because e.g.
you want to adhere to Bits spec:
* Return numBits from length()
* Bits.get(int) document that you should not call it with out of bounds indexes
* Iterator should iterate up to the last set/cleared bit
But it has to be a first-class member, not an assert-only one.
Another quirkiness, what should the app expect to happen if it calls
{{obs.get(170)}} (and say bits.length=1)? It currently receives false and not
e.g. AIOOBE. But this could falsly lead to calling fastSet(170) thinking that
the bitset is at the right size. I'm not saying this is a super-duper usecase
we should handle, since if an app calls fastSet, it should also call fastGet,
but I think it will be good if we are consistent about when you get false and
when you hit out-of-bounds.
The out-of-bounds here are also related to 'wlen', as the method checks if the
bit is within bits.length range, not wlen range. This is another bug I think
since if you share a bits[] which has say bit 170 set, but you limit it to 2
words, calling OBS.get(170) will false return true?
I think this class could use some serious overhauling and re-thinking. I find
it very confusing as it is written now. We should decide what are the contracts
of each method, and then implement it as such.
And though unrelated to the bugs reported in this issue, perhaps someone can
explain why we need both the 'int' and 'long' method variants? Can't we have
only the long indexes?
> OpenBitSet.ensureCapacity does not modify numBits
> -------------------------------------------------
>
> Key: LUCENE-5272
> URL: https://issues.apache.org/jira/browse/LUCENE-5272
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/index
> Reporter: Shai Erera
> Attachments: LUCENE-5272.patch
>
>
> It's a simple bug, reproduced by this simple test:
> {code}
> public void testEnsureCapacity() {
> OpenBitSet bits = new OpenBitSet(1);
> bits.fastSet(0);
> bits.ensureCapacity(5); // make room for more bits
> bits.fastSet(2);
> }
> {code}
> The problem is that {{numBits}} which is used only for assrets isn't modified
> by ensureCapacity and so the next fastSet trips the assert. I guess we should
> also fix ensureCapacityWords and test it too.
> I may not be able to fix this until Sunday though, so if anyone wants to fix
> it before (maybe it can make it into 4.5.1), feel free.
--
This message was sent by Atlassian JIRA
(v6.1#6144)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]