[ https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801249#action_12801249 ]
Yonik Seeley edited comment on LUCENE-2216 at 1/16/10 6:34 PM: --------------------------------------------------------------- bq. I agree that having hashCode mutate the object's state is weird. I had some thoughts about it - this particular mutation seems to be "safe" even from multi-threaded point of view. If another thread sees a stale value of wlen, then the only thing that is going to happen is it will scan more memory; There are still quite a few things that can go wrong I think. If all threads *only* called hashCode and equals, then you *might* be right... it's very specific to the implementation of trimTrailingZeros() {code} public void trimTrailingZeros() { int idx = wlen-1; while (idx>=0 && bits[idx]==0) idx--; wlen = idx+1; } {code} What could make that work is the fact that wlen is an integer, is never directly used as the loop counter, or as an index into the array. But the other big questions: are other read operations tolerant of wlen changing out from under them? My guess would be no. Look at xorCount for example: {code} if (a.wlen < b.wlen) { tot += BitUtil.pop_array(b.bits, a.wlen, b.wlen-a.wlen); {code} hashCode and equals changing wlen could cause a negative value to be passed to pop_array. edit: deleted second example, which isn't different from the first (the issue is safety with other read ops). was (Author: ysee...@gmail.com): bq. I agree that having hashCode mutate the object's state is weird. I had some thoughts about it - this particular mutation seems to be "safe" even from multi-threaded point of view. If another thread sees a stale value of wlen, then the only thing that is going to happen is it will scan more memory; There are still quite a few things that can go wrong I think. If all threads *only* called hashCode and equals, then you *might* be right... it's very specific to the implementation of trimTrailingZeros() {code} public void trimTrailingZeros() { int idx = wlen-1; while (idx>=0 && bits[idx]==0) idx--; wlen = idx+1; } {code} What could make that work is the fact that wlen is an integer, is never directly used as the loop counter, or as an index into the array. But the other big questions: are other read operations tolerant of wlen changing out from under them? My guess would be no. Look at xorCount for example: {code} if (a.wlen < b.wlen) { tot += BitUtil.pop_array(b.bits, a.wlen, b.wlen-a.wlen); {code} hashCode and equals changing wlen could cause a negative value to be passed to pop_array. Yet another problem: say someone actually does want to change the set occasionally. One way they could safely do this is to use a read-write lock to allow any number of readers to read the set as long as a writer wasn't writing it. But equals and hashCode would need to be categorized under "write" methods for this to work... (definitely unexpected) otherwise all sorts of bad stuff would happen. > OpenBitSet#hashCode() may return false for identical sets. > ---------------------------------------------------------- > > Key: LUCENE-2216 > URL: https://issues.apache.org/jira/browse/LUCENE-2216 > Project: Lucene - Java > Issue Type: Bug > Components: Other > Affects Versions: 2.9, 2.9.1, 3.0 > Reporter: Dawid Weiss > Priority: Minor > Attachments: LUCENE-2216.patch, openbitset.patch > > > OpenBitSet uses an internal buffer of long variables to store set bits and an > additional 'wlen' index that points > to the highest used component inside {...@link #bits} buffer. > Unlike in JDK, the wlen field is not continuously maintained (on clearing > bits, for example). This leads to a situation when wlen may point > far beyond the last set bit. > The hashCode implementation iterates over all long components of the bits > buffer, rotating the hash even for empty components. This is against the > contract of hashCode-equals. The following test case illustrates this: > {code} > // initialize two bitsets with different capacity (bits length). > BitSet bs1 = new BitSet(200); > BitSet bs2 = new BitSet(64); > // set the same bit. > bs1.set(3); > bs2.set(3); > > // equals returns true (passes). > assertEquals(bs1, bs2); > // hashCode returns false (against contract). > assertEquals(bs1.hashCode(), bs2.hashCode()); > {code} > Fix and test case attached. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org