[ https://issues.apache.org/jira/browse/LUCENE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless updated LUCENE-5604: --------------------------------------- Attachment: LUCENE-5604.patch New patch, folding in all feedback (thanks!). I think it's ready: * I reverted the Solr changes * I dup'd the murmurhash3_x86_32 taking byte[] into StringHelper, but changed to the intrinsics for Integer.rotateLeft * I added a small test case, confirming our MurmurHash3 impl matches a separate Python/C impl I found * I made the hashing private to BytesRefHash, and changed TermToBytesAtt.fillBytesRef to return void * For the seed/salt, I now pull from tests.seed property if it's non-null > Should we switch BytesRefHash to MurmurHash3? > --------------------------------------------- > > Key: LUCENE-5604 > URL: https://issues.apache.org/jira/browse/LUCENE-5604 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index > Reporter: Michael McCandless > Assignee: Michael McCandless > Fix For: 4.9, 5.0 > > Attachments: BytesRefHash.perturb.patch, LUCENE-5604.patch, > LUCENE-5604.patch > > > MurmurHash3 has better hashing distribution than the current hash function we > use for BytesRefHash which is a simple multiplicative function with 31 > multiplier (same as Java's String.hashCode, but applied to bytes not chars). > Maybe we should switch ... -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org