[ 
https://issues.apache.org/jira/browse/LUCENE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-5604:
---------------------------------------

    Attachment: BytesRefHash.perturb.patch

Separately, I also tried a different probing function inside
BytesRefHash, poaching the "perturbing" approach from Python's
dictionary object:

{noformat}
Wiki
  murmur + perturb: 134.228 sec, 176358406 conflicts

Geonames
  murmur + perturb: 167.735 sec, 200311281 conflicts
{noformat}

Curiously, it increased the number of collisions from Murmur3 alone.
It's possible I messed up the implementation (though all Lucene tests
did pass).

Or, it could be that because we only use 32 bits for our hash code
(Python uses 64 bit hash codes on 64 bit arch), we just don't have
enough bits to mixin when probing for new addresses.

In fact, if we move all hashing to be private (under the hood) of
BytesRefHash, maybe we could switch to the 128 bit variant MurmurHash3
and then the "perturbing" might help.


> Should we switch BytesRefHash to MurmurHash3?
> ---------------------------------------------
>
>                 Key: LUCENE-5604
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5604
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.9, 5.0
>
>         Attachments: BytesRefHash.perturb.patch, LUCENE-5604.patch
>
>
> MurmurHash3 has better hashing distribution than the current hash function we 
> use for BytesRefHash which is a simple multiplicative function with 31 
> multiplier (same as Java's String.hashCode, but applied to bytes not chars).  
> Maybe we should switch ...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to