[
https://issues.apache.org/jira/browse/LUCENE-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14737126#comment-14737126
]
Robert Tarrall commented on LUCENE-6788:
----------------------------------------
After sleeping on it... if we need a positive value, "hash = hash &
Integer.MAX_VALUE" would be the correct way to force it positive, rather than
using a magic number.
That said, yeah, I'm not able to follow the logic well enough to know whether
it needs to be positive, or even an integer. Overall it seems like using all
32 bits available from the hashing function would be a win.
> Mishandling of Integer.MIN_VALUE in FuzzySet leads to AssertionError
> --------------------------------------------------------------------
>
> Key: LUCENE-6788
> URL: https://issues.apache.org/jira/browse/LUCENE-6788
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/index
> Affects Versions: 4.10.4, Trunk
> Reporter: Robert Tarrall
>
> Reindexing some data in the DataStax Enterprise Search product (which uses
> Solr) led to these stack traces:
> ERROR [Lucene Merge Thread #13430] 2015-09-08 11:14:36,582
> CassandraDaemon.java (line 258) Exception in thread Thread[Lucene Merge
> Thread #13430,6,main]
> org.apache.lucene.index.MergePolicy$MergeException: java.lang.AssertionError
> at
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:545)
> at
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:518)
> Caused by: java.lang.AssertionError
> at
> org.apache.lucene.codecs.bloom.FuzzySet.mayContainValue(FuzzySet.java:216)
> at org.apache.lucene.codecs.bloom.FuzzySet.contains(FuzzySet.java:165)
> at
> org.apache.lucene.codecs.bloom.BloomFilteringPostingsFormat$BloomFilteredFieldsProducer$BloomFilteredTermsEnum.seekExact(BloomFilteringPostingsFormat.java:351)
> at
> org.apache.lucene.index.BufferedUpdatesStream.applyTermDeletes(BufferedUpdatesStream.java:414)
> at
> org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates(BufferedUpdatesStream.java:283)
> at
> org.apache.lucene.index.IndexWriter._mergeInit(IndexWriter.java:3838)
> at
> org.apache.lucene.index.IndexWriter.mergeInit(IndexWriter.java:3799)
> at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3651)
> at
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
> at
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
> In tracking down the cause of the stack trace, I noticed this:
> https://github.com/apache/lucene-solr/blob/trunk/lucene/codecs/src/java/org/apache/lucene/codecs/bloom/FuzzySet.java#L164
> It is possible for the Murmur2 hash to return Integer.MIN_VALUE (e.g. when
> hashing "WeH44wlbCK"). Multiplying Integer.MIN_VALUE by -1 returns
> Integer.MIN_VALUE again, so the "positiveHash >= 0" assertion at line 217
> fails.
> We could special-case Integer.MIN_VALUE, map it to 42 or some other magic
> number... since the same "* -1" logic appears on line 236 perhaps it should
> be part of the hash function?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]