Re: Performance of primitive collections

Robin Anil Thu, 06 Jun 2013 06:24:24 -0700

I also did a small experiment myself, not conclusive but putting here for
the sake of record.


adjustOrPutValue() was calling put if it didn't find the entry in the
table, which re-did the hash lookup, it turns out that reducing that extra
lookup didn't help the benchmarks significantly. (1.4 drop is not
significant)


Before
        library    ms linear runtime
           HPPC  85.5 =================
          TROVE 112.9 =======================
  FASTUTIL_OPEN  65.3 =============
FASTUTIL_LINKED  61.5 ============
         MAHOUT 143.4 ==============================


After

        library    ms linear runtime
           HPPC  85.3 ==================
          TROVE 111.1 =======================
  FASTUTIL_OPEN  65.0 =============
FASTUTIL_LINKED  61.2 ============
         MAHOUT 142.0 ==============================


Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.


On Thu, Jun 6, 2013 at 9:05 AM, Dawid Weiss <[email protected]>wrote:

> > Yes I thought this was why you could largely get away with it. If the
> > table is big enough, collisions and chains should not be a big deal to
> > begin with.
>
> I admit I didn't think it could be double hashing behind so I assumed 2^n-1
> -- sorry for not checking before. Good to be able to compare alternate
> implementations (the upside of prime-based sizes is that they grow somewhat
> less aggressively).
>
> > Since you have this benchmark readily available -- what happens if you
> > un-comment one or both of the lines of code that mixes the hash? That
> > would really answer it.
>
> I've ran the same set of benchmarks substituting MurmurHash3's last
> avalanche step in HashOperations. The results are, in short, that it does
> get a tad bit slower on the average, but is (predictably) more stable --
>
> BenchmarkContainsWithRemoved (2000000 calls):
> Before:
> removedKeys    ms linear runtime
>           0  95.6 =====================
>        0.25 116.2 ==========================
>         0.5 131.8 ==============================
>        0.75  78.3 =================
>           1  12.9 ==
> After
> removedKeys    ms linear runtime
>           0 101.4 =====================
>        0.25 123.6 ==========================
>         0.5 138.6 ==============================
>        0.75  82.0 =================
>           1  14.7 ===
>
> BenchmarkPut (1000000 calls):
> Before:
> distribution   ms linear runtime
>       RANDOM 61.8 ==============================
>       LINEAR 20.9 ==========
>     HIGHBITS 54.8 ==========================
> After:
> distribution   ms linear runtime
>       RANDOM 64.9 =============================
>       LINEAR 65.2 =============================
>     HIGHBITS 65.7 ==============================
>
> So you're right -- I don't think it makes sense to add that hashing unless
> you can devise a worst-case sequence that will break double hashing (I
> can't). Note, however, that these times are generally slower than with
> 2^n-1 implementations with hashing and linear probing; this seems clear
> taking into account CPU caches and the fact that less (and simpler) math
> needs to be done for every key. The advantage I see in double hashing is
> being able to devise a prime sequence that doesn't grow as fast as 2^n.
>
> So the conclusion: no need for hashing. Which makes me thing why bother to
> do this then:
>
>   public static int hash(float value) {
>     return Float.floatToIntBits(value * 663608941.737f);
>
> or this, for that matter:
>
>   public static int hash(long value) {
>     return (int) (value ^ (value >> 32));
>
> Note all bits will cancel out to zero if your input keys are (value << 32 |
> value). If you remember I mentioned guarding against the worst case -- a
> random hash seems to be less probable then this (run it):
>
>         OpenLongIntHashMap map = new OpenLongIntHashMap();
>         for (int i = 0; i < 1000000; i++) {
>             long k = (long) i << 32 | i;
>             map.put(k, i);
>         }
>
> Dawid
>

Re: Performance of primitive collections

Reply via email to