[
https://issues.apache.org/jira/browse/LUCENE-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736443#comment-14736443
]
Adrien Grand commented on LUCENE-6788:
--------------------------------------
Hmm actually it looks to me that having a positive value is not necessary as
the only thing we are doing with the result of the hash is to and it with the
bloom size, which would work fine with a negative number too.
> Mishandling of Integer.MIN_VALUE in FuzzySet leads to AssertionError
> --------------------------------------------------------------------
>
> Key: LUCENE-6788
> URL: https://issues.apache.org/jira/browse/LUCENE-6788
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/index
> Affects Versions: 4.10.4, Trunk
> Reporter: Robert Tarrall
>
> Reindexing some data in the DataStax Enterprise Search product (which uses
> Solr) led to these stack traces:
> ERROR [Lucene Merge Thread #13430] 2015-09-08 11:14:36,582
> CassandraDaemon.java (line 258) Exception in thread Thread[Lucene Merge
> Thread #13430,6,main]
> org.apache.lucene.index.MergePolicy$MergeException: java.lang.AssertionError
> at
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:545)
> at
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:518)
> Caused by: java.lang.AssertionError
> at
> org.apache.lucene.codecs.bloom.FuzzySet.mayContainValue(FuzzySet.java:216)
> at org.apache.lucene.codecs.bloom.FuzzySet.contains(FuzzySet.java:165)
> at
> org.apache.lucene.codecs.bloom.BloomFilteringPostingsFormat$BloomFilteredFieldsProducer$BloomFilteredTermsEnum.seekExact(BloomFilteringPostingsFormat.java:351)
> at
> org.apache.lucene.index.BufferedUpdatesStream.applyTermDeletes(BufferedUpdatesStream.java:414)
> at
> org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates(BufferedUpdatesStream.java:283)
> at
> org.apache.lucene.index.IndexWriter._mergeInit(IndexWriter.java:3838)
> at
> org.apache.lucene.index.IndexWriter.mergeInit(IndexWriter.java:3799)
> at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3651)
> at
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
> at
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
> In tracking down the cause of the stack trace, I noticed this:
> https://github.com/apache/lucene-solr/blob/trunk/lucene/codecs/src/java/org/apache/lucene/codecs/bloom/FuzzySet.java#L164
> It is possible for the Murmur2 hash to return Integer.MIN_VALUE (e.g. when
> hashing "WeH44wlbCK"). Multiplying Integer.MIN_VALUE by -1 returns
> Integer.MIN_VALUE again, so the "positiveHash >= 0" assertion at line 217
> fails.
> We could special-case Integer.MIN_VALUE, map it to 42 or some other magic
> number... since the same "* -1" logic appears on line 236 perhaps it should
> be part of the hash function?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]