> Yes I thought this was why you could largely get away with it. If the
> table is big enough, collisions and chains should not be a big deal to
> begin with.

I admit I didn't think it could be double hashing behind so I assumed 2^n-1
-- sorry for not checking before. Good to be able to compare alternate
implementations (the upside of prime-based sizes is that they grow somewhat
less aggressively).

> Since you have this benchmark readily available -- what happens if you
> un-comment one or both of the lines of code that mixes the hash? That
> would really answer it.

I've ran the same set of benchmarks substituting MurmurHash3's last
avalanche step in HashOperations. The results are, in short, that it does
get a tad bit slower on the average, but is (predictably) more stable --

BenchmarkContainsWithRemoved (2000000 calls):
Before:
removedKeys    ms linear runtime
          0  95.6 =====================
       0.25 116.2 ==========================
        0.5 131.8 ==============================
       0.75  78.3 =================
          1  12.9 ==
After
removedKeys    ms linear runtime
          0 101.4 =====================
       0.25 123.6 ==========================
        0.5 138.6 ==============================
       0.75  82.0 =================
          1  14.7 ===

BenchmarkPut (1000000 calls):
Before:
distribution   ms linear runtime
      RANDOM 61.8 ==============================
      LINEAR 20.9 ==========
    HIGHBITS 54.8 ==========================
After:
distribution   ms linear runtime
      RANDOM 64.9 =============================
      LINEAR 65.2 =============================
    HIGHBITS 65.7 ==============================

So you're right -- I don't think it makes sense to add that hashing unless
you can devise a worst-case sequence that will break double hashing (I
can't). Note, however, that these times are generally slower than with
2^n-1 implementations with hashing and linear probing; this seems clear
taking into account CPU caches and the fact that less (and simpler) math
needs to be done for every key. The advantage I see in double hashing is
being able to devise a prime sequence that doesn't grow as fast as 2^n.

So the conclusion: no need for hashing. Which makes me thing why bother to
do this then:

  public static int hash(float value) {
    return Float.floatToIntBits(value * 663608941.737f);

or this, for that matter:

  public static int hash(long value) {
    return (int) (value ^ (value >> 32));

Note all bits will cancel out to zero if your input keys are (value << 32 |
value). If you remember I mentioned guarding against the worst case -- a
random hash seems to be less probable then this (run it):

        OpenLongIntHashMap map = new OpenLongIntHashMap();
        for (int i = 0; i < 1000000; i++) {
            long k = (long) i << 32 | i;
            map.put(k, i);
        }

Dawid

Reply via email to