Re: Performance of primitive collections

Benson Margulies Fri, 04 Mar 2011 06:38:54 -0800

I am 100% in favor of replacing the colt-based thing in there now with
something better if there is something better. I'll commit the patch,
or help test, or whatever.


I volunteered to put lipstick on the colt-based pig because I really
wanted *some* non-GPL alternative to Trove at Apache, and the
colt-based code sitting around Mahout was the shortest path.


On Fri, Mar 4, 2011 at 6:11 AM, Dawid Weiss
<[email protected]> wrote:
> A while ago we discussed if it'd make sense to integrate HPPC with Mahout, I
> remember Ted mentioned it'd be good to have some kind of comparative
> benchmark that would simply compare Mahout Math collections against HPPC. I
> finally found the time to do it, driven by some strong updates Sebastiano
> Vinga introduced in fastutil. But first, a disclaimer :)
>
> I am not trying to impose HPPC on anybody and I'm not trying to show which
> library is "better" or "worse" in a general sense. I believe the benchmarks
> presented below and any conclusions that can be drawn from them apply only
> in a marginal sense to any algorithm complex enough to be doing something
> ELSE other than stressing a given hash map implementation. I also believe
> that diversity drives progress, so it's good to share ;)
>
> So. First, I've been thinking about how to measure hash map performance in
> any sensible (read: reasonable) application. Sebastiano suggested some clues
> and I came up with some other ideas, but any realistic benchmarks (in the
> sense of data or algorithm) would be a welcome extension. I've implemented
> the following benchmarks (all on int-int maps):
>
> - BenchmarkPut: injecting a lot of integers in a map, basically what one
> would do to get a unique set of keys, for example. The input ints have
> different distributions: linear, random and high-bits linear (increments on
> higher bits, then on lower bits).
>
> - BenchmarkGetWithRemoved: injecting a lot of integers, then removing a
> fraction of them (not benchmarked), and (benchmarked) querying this map for
> a partially matching set of keys. A realistic scenario if you have removals
> from the map.
>
> - BenchmarkBigramCounting: calculating bigram frequencies from a real corpus
> of texts in Polish. The input is 10MB long, so it's not a big data set, but
> the texts are realistic and hence so are bigram distributions.
>
> I tested the following hash map implementations (int-int primitives):
>
> - fastutil (6.1.0),
> - HPPC (trunk, but without improvements resulting from these benchmarks yet,
> so I'm not biased ;)
> - Java HashMap (on boxed Integer keys)
> - GNU Trove (3.0.0rc1),
> - Mahout Math/Collections (1.0)
>
> The bigram counting also includes PCJ and fastutil's linked hash map
> implementation for historic reasons (but these are less relevant).
>
> The benchmarks were done on a Windows7 machine with I7 CPU, detailed log
> attached, using Google Caliper. Times below are in ms. Links
> should lead you to on-line charts. Some conclusions:
>
> 1) BenchmarkPut,
> http://dawidweiss123.appspot.com/run/[email protected]/com.carrotsearch.hppc.caliper.BenchmarkPut
>
> distribution implementation    ms linear runtime
>      RANDOM           HPPC  77.9 ==========
>      LINEAR           HPPC  79.1 ==========
>    HIGHBITS           HPPC  78.4 ==========
>      RANDOM       FASTUTIL  97.4 ============
>      LINEAR       FASTUTIL 101.2 =============
>    HIGHBITS       FASTUTIL 101.3 =============
>      RANDOM           JAVA 232.6 ==============================
>      LINEAR           JAVA  72.7 =========
>    HIGHBITS           JAVA 184.2 =======================
>      RANDOM         MAHOUT 125.9 ================
>      LINEAR         MAHOUT  60.3 =======
>    HIGHBITS         MAHOUT  98.4 ============
>      RANDOM          TROVE 113.8 ==============
>      LINEAR          TROVE  43.8 =====
>    HIGHBITS          TROVE  83.7 ==========
>
> I believe additional good hashing of integer keys does help to even-out any
> weird input key distributions (and these would occur a lot in practical
> situations, I bet). fastutil uses the finalization step from MurmurHash3,
> HPPC uses a simplified full-pass over int4 bytes from MurmurHash2, but these
> yield an equal effect (the speed difference is a result of another thing).
> Mahout Math currently doesn't hash ints, so it'd be probably a good addition
> (at very little overhead).
>
> 2) BenchmarkGetWithRemoved:
> http://dawidweiss123.appspot.com/run/[email protected]/com.carrotsearch.hppc.caliper.BenchmarkGetWithRemoved
>
> %rem.keys   implementation     ms linear runtime
>          0           HPPC  65.73 ============
>          0       FASTUTIL  52.63 ==========
>          0           JAVA 156.37 ==============================
>          0          TROVE  75.31 ==============
>          0         MAHOUT  85.59 ================
>       0.25           HPPC  87.19 ================
>       0.25       FASTUTIL  58.24 ===========
>       0.25           JAVA 149.26 ============================
>       0.25          TROVE 123.32 =======================
>       0.25         MAHOUT 113.77 =====================
>        0.5           HPPC 101.70 ===================
>        0.5       FASTUTIL  55.74 ==========
>        0.5           JAVA 135.01 =========================
>        0.5          TROVE  93.19 =================
>        0.5         MAHOUT 125.16 ========================
>       0.75           HPPC  93.98 ==================
>       0.75       FASTUTIL  42.20 ========
>       0.75           JAVA 101.29 ===================
>       0.75          TROVE  76.07 ==============
>       0.75         MAHOUT  94.67 ==================
>          1           HPPC  60.52 ===========
>          1       FASTUTIL  10.78 ==
>          1           JAVA  44.30 ========
>          1          TROVE   9.54 =
>          1         MAHOUT  30.98 =====
>
> This shows two things. The first thing is that open addressing based on
> double hashing (with REMOVED state markers) requires either automatic or
> manual compaction (rehashing) to avoid performance degradation. HPPC doesn't
> do this because we rely on the user and so the time grows with the number of
> removed keys:
>
>          0           HPPC  65.73 ============
>       0.25           HPPC  87.19 ================
>        0.5           HPPC 101.70 ===================
>       0.75           HPPC  93.98 ==================
>          1           HPPC  60.52 ===========
>
> (the drop at 100% of removed keys is probably HotSpot overlearning and
> optimizing differently). It is curious as to why Mahout's implementation was
> so much slower than anything else, but I didn't have the time to investigate
> yet. The only thing that comes to my mind is using modulus % for sizing
> instead of bitmasking (restricted 2^n map sizes). Also note that even
> regular Java does a fairly good job here (the number of keys was 2,000,000,
> but even for larger inputs the results were quite similar). Also, note
> fastutil here:
>
>          0       FASTUTIL  52.63 ==========
>       0.25       FASTUTIL  58.24 ===========
>        0.5       FASTUTIL  55.74 ==========
>       0.75       FASTUTIL  42.20 ========
>          1       FASTUTIL  10.78 ==
>
> Nice. Sebastiano reimplemented hash maps to use linear addressing instead of
> double hashing and with linear addressing you can avoid REMOVED state
> markers entirely (so no need to rehash and you'll get empty spots for reuse
> if adding new elements). The key here is the combination of linear
> addressing AND a good key hash mangling function (theoretically, with a
> perfect hashing function, the input key distribution wouldn't affect linear
> addressing performance at all -- no clusters or chains effect.
>
> 3) Bigrams,
> http://dawidweiss123.appspot.com/run/[email protected]/com.carrotsearch.hppc.caliper.BenchmarkBigramCounting
>
> This is a toy example, I know, but it's the only realistic one in the set,
> so it's probably worth mentioning. It is a little bit puzzling to me why
> HPPC is (by a small margin) better than fastutil because it really shouldn't
> be. This isn't exactly noise (variance is given in plain text attached), but
> I still consider this a glitch within the regular variance.
>
> You can try these on your own machine -- I would actually appreciate if you
> sent your results and feedback back to me or this list so that we can see
> how the above holds consistent betwen architectures, processors and VMs.
>
> git clone [email protected]:carrotsearch/hppc.git
> cd hppc-others
> (cd lib/; bash ./install-deps )
> mvn clean package
> java -server -jar hppc-others-0.4.0-SNAPSHOT/hppc-others*.jar | tee
> benchmark.log
>
> I didn't check the performance of simple lists or other structures with
> minimal overhead because I don't imagine they being significantly faster or
> slower in any library. For these other facts play a role (I personally like
> the fact I can access the internal buffers in HPPC, but it may against be a
> point against using it ;).
>
> Dawid
>

Re: Performance of primitive collections

Reply via email to