[ 
https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801240#action_12801240
 ] 

Dawid Weiss commented on LUCENE-2216:
-------------------------------------

uff, I started having doubts in my own understanding, thanks for being patient 
with me.

I agree that having hashCode mutate the object's state is weird. I had some 
thoughts about it -- this particular mutation seems to be "safe" even from 
multi-threaded point of view. If another thread sees a stale value of wlen, 
then the only thing that is going to happen is it will scan more memory; for 
ands, ors and other types of operations this will have no effect. So assuming 
hashCode/equals is the ONLY method you're calling concurrently, it shouldn't 
break things. A similar kind of trickery goes on in String#hashCode (caching to 
a non-volatile field), although that object is immutable, so it's a slightly 
different scenario.

To be honest, my preference for this would be to either maintain the wlen field 
during all operations (like java.util.BitSet) or at least to clearly state 
(JavaDoc?) that trimTrailingZeros() should be invoked prior to publishing the 
object for other threads for increased performance (in case you fiddle with 
bits and clear the tail). In the second options, your patch does a fine job of 
not mutating the object and correcting the bug.

Thanks for an interesting discussion.

> OpenBitSet#hashCode() may return false for identical sets.
> ----------------------------------------------------------
>
>                 Key: LUCENE-2216
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2216
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Other
>    Affects Versions: 2.9, 2.9.1, 3.0
>            Reporter: Dawid Weiss
>            Priority: Minor
>         Attachments: LUCENE-2216.patch, openbitset.patch
>
>
> OpenBitSet uses an internal buffer of long variables to store set bits and an 
> additional 'wlen' index that points 
> to the highest used component inside {...@link #bits} buffer.
> Unlike in JDK, the wlen field is not continuously maintained (on clearing 
> bits, for example). This leads to a situation when wlen may point
> far beyond the last set bit. 
> The hashCode implementation iterates over all long components of the bits 
> buffer, rotating the hash even for empty components. This is against the 
> contract of hashCode-equals. The following test case illustrates this:
> {code}
> // initialize two bitsets with different capacity (bits length).
> BitSet bs1 = new BitSet(200);
> BitSet bs2 = new BitSet(64);
> // set the same bit.
> bs1.set(3);
> bs2.set(3);
>         
> // equals returns true (passes).
> assertEquals(bs1, bs2);
> // hashCode returns false (against contract).
> assertEquals(bs1.hashCode(), bs2.hashCode());
> {code}
> Fix and test case attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to