[ https://issues.apache.org/jira/browse/LUCENE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13968122#comment-13968122 ]
Adrien Grand commented on LUCENE-5604: -------------------------------------- Strong +1 on this change! bq. Separately, I also tried a different probing function inside BytesRefHash I'm wondering if we should try linear probing? Now that we use a good hash function, the likelyness of having clusters of hashes in the hash table is much lower (especially given that BytesRefHash hard-codes quite a low load factor: 0.5) so linear probing might help get some performance back since it tends to be more cache-friendly? bq. I added a small test case, confirming our MurmurHash3 impl matches a separate Python/C impl I found Maybe we could add Guava as a test dependency and do some duels on random bytes? > Should we switch BytesRefHash to MurmurHash3? > --------------------------------------------- > > Key: LUCENE-5604 > URL: https://issues.apache.org/jira/browse/LUCENE-5604 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index > Reporter: Michael McCandless > Assignee: Michael McCandless > Fix For: 4.9, 5.0 > > Attachments: BytesRefHash.perturb.patch, LUCENE-5604.patch, > LUCENE-5604.patch > > > MurmurHash3 has better hashing distribution than the current hash function we > use for BytesRefHash which is a simple multiplicative function with 31 > multiplier (same as Java's String.hashCode, but applied to bytes not chars). > Maybe we should switch ... -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org