Re: Performance of primitive collections

Dawid Weiss Fri, 04 Mar 2011 06:42:15 -0800

I know I'm shooting my own foot here, but fastutil is now ASL licensed and
it provides Java Collections compliance, so it should be a drop-in
replacement for former Colt collections... It is a larger library though
(but then, it provides more stuff ;).


Dawid

On Fri, Mar 4, 2011 at 3:38 PM, Benson Margulies <[email protected]>wrote:

> I am 100% in favor of replacing the colt-based thing in there now with
> something better if there is something better. I'll commit the patch,
> or help test, or whatever.
>
> I volunteered to put lipstick on the colt-based pig because I really
> wanted *some* non-GPL alternative to Trove at Apache, and the
> colt-based code sitting around Mahout was the shortest path.
>
>
> On Fri, Mar 4, 2011 at 6:11 AM, Dawid Weiss
> <[email protected]> wrote:
> > A while ago we discussed if it'd make sense to integrate HPPC with
> Mahout, I
> > remember Ted mentioned it'd be good to have some kind of comparative
> > benchmark that would simply compare Mahout Math collections against HPPC.
> I
> > finally found the time to do it, driven by some strong updates Sebastiano
> > Vinga introduced in fastutil. But first, a disclaimer :)
> >
> > I am not trying to impose HPPC on anybody and I'm not trying to show
> which
> > library is "better" or "worse" in a general sense. I believe the
> benchmarks
> > presented below and any conclusions that can be drawn from them apply
> only
> > in a marginal sense to any algorithm complex enough to be doing something
> > ELSE other than stressing a given hash map implementation. I also believe
> > that diversity drives progress, so it's good to share ;)
> >
> > So. First, I've been thinking about how to measure hash map performance
> in
> > any sensible (read: reasonable) application. Sebastiano suggested some
> clues
> > and I came up with some other ideas, but any realistic benchmarks (in the
> > sense of data or algorithm) would be a welcome extension. I've
> implemented
> > the following benchmarks (all on int-int maps):
> >
> > - BenchmarkPut: injecting a lot of integers in a map, basically what one
> > would do to get a unique set of keys, for example. The input ints have
> > different distributions: linear, random and high-bits linear (increments
> on
> > higher bits, then on lower bits).
> >
> > - BenchmarkGetWithRemoved: injecting a lot of integers, then removing a
> > fraction of them (not benchmarked), and (benchmarked) querying this map
> for
> > a partially matching set of keys. A realistic scenario if you have
> removals
> > from the map.
> >
> > - BenchmarkBigramCounting: calculating bigram frequencies from a real
> corpus
> > of texts in Polish. The input is 10MB long, so it's not a big data set,
> but
> > the texts are realistic and hence so are bigram distributions.
> >
> > I tested the following hash map implementations (int-int primitives):
> >
> > - fastutil (6.1.0),
> > - HPPC (trunk, but without improvements resulting from these benchmarks
> yet,
> > so I'm not biased ;)
> > - Java HashMap (on boxed Integer keys)
> > - GNU Trove (3.0.0rc1),
> > - Mahout Math/Collections (1.0)
> >
> > The bigram counting also includes PCJ and fastutil's linked hash map
> > implementation for historic reasons (but these are less relevant).
> >
> > The benchmarks were done on a Windows7 machine with I7 CPU, detailed log
> > attached, using Google Caliper. Times below are in ms. Links
> > should lead you to on-line charts. Some conclusions:
> >
> > 1) BenchmarkPut,
> >
> http://dawidweiss123.appspot.com/run/[email protected]/com.carrotsearch.hppc.caliper.BenchmarkPut
> >
> > distribution implementation    ms linear runtime
> >      RANDOM           HPPC  77.9 ==========
> >      LINEAR           HPPC  79.1 ==========
> >    HIGHBITS           HPPC  78.4 ==========
> >      RANDOM       FASTUTIL  97.4 ============
> >      LINEAR       FASTUTIL 101.2 =============
> >    HIGHBITS       FASTUTIL 101.3 =============
> >      RANDOM           JAVA 232.6 ==============================
> >      LINEAR           JAVA  72.7 =========
> >    HIGHBITS           JAVA 184.2 =======================
> >      RANDOM         MAHOUT 125.9 ================
> >      LINEAR         MAHOUT  60.3 =======
> >    HIGHBITS         MAHOUT  98.4 ============
> >      RANDOM          TROVE 113.8 ==============
> >      LINEAR          TROVE  43.8 =====
> >    HIGHBITS          TROVE  83.7 ==========
> >
> > I believe additional good hashing of integer keys does help to even-out
> any
> > weird input key distributions (and these would occur a lot in practical
> > situations, I bet). fastutil uses the finalization step from MurmurHash3,
> > HPPC uses a simplified full-pass over int4 bytes from MurmurHash2, but
> these
> > yield an equal effect (the speed difference is a result of another
> thing).
> > Mahout Math currently doesn't hash ints, so it'd be probably a good
> addition
> > (at very little overhead).
> >
> > 2) BenchmarkGetWithRemoved:
> >
> http://dawidweiss123.appspot.com/run/[email protected]/com.carrotsearch.hppc.caliper.BenchmarkGetWithRemoved
> >
> > %rem.keys   implementation     ms linear runtime
> >          0           HPPC  65.73 ============
> >          0       FASTUTIL  52.63 ==========
> >          0           JAVA 156.37 ==============================
> >          0          TROVE  75.31 ==============
> >          0         MAHOUT  85.59 ================
> >       0.25           HPPC  87.19 ================
> >       0.25       FASTUTIL  58.24 ===========
> >       0.25           JAVA 149.26 ============================
> >       0.25          TROVE 123.32 =======================
> >       0.25         MAHOUT 113.77 =====================
> >        0.5           HPPC 101.70 ===================
> >        0.5       FASTUTIL  55.74 ==========
> >        0.5           JAVA 135.01 =========================
> >        0.5          TROVE  93.19 =================
> >        0.5         MAHOUT 125.16 ========================
> >       0.75           HPPC  93.98 ==================
> >       0.75       FASTUTIL  42.20 ========
> >       0.75           JAVA 101.29 ===================
> >       0.75          TROVE  76.07 ==============
> >       0.75         MAHOUT  94.67 ==================
> >          1           HPPC  60.52 ===========
> >          1       FASTUTIL  10.78 ==
> >          1           JAVA  44.30 ========
> >          1          TROVE   9.54 =
> >          1         MAHOUT  30.98 =====
> >
> > This shows two things. The first thing is that open addressing based on
> > double hashing (with REMOVED state markers) requires either automatic or
> > manual compaction (rehashing) to avoid performance degradation. HPPC
> doesn't
> > do this because we rely on the user and so the time grows with the number
> of
> > removed keys:
> >
> >          0           HPPC  65.73 ============
> >       0.25           HPPC  87.19 ================
> >        0.5           HPPC 101.70 ===================
> >       0.75           HPPC  93.98 ==================
> >          1           HPPC  60.52 ===========
> >
> > (the drop at 100% of removed keys is probably HotSpot overlearning and
> > optimizing differently). It is curious as to why Mahout's implementation
> was
> > so much slower than anything else, but I didn't have the time to
> investigate
> > yet. The only thing that comes to my mind is using modulus % for sizing
> > instead of bitmasking (restricted 2^n map sizes). Also note that even
> > regular Java does a fairly good job here (the number of keys was
> 2,000,000,
> > but even for larger inputs the results were quite similar). Also, note
> > fastutil here:
> >
> >          0       FASTUTIL  52.63 ==========
> >       0.25       FASTUTIL  58.24 ===========
> >        0.5       FASTUTIL  55.74 ==========
> >       0.75       FASTUTIL  42.20 ========
> >          1       FASTUTIL  10.78 ==
> >
> > Nice. Sebastiano reimplemented hash maps to use linear addressing instead
> of
> > double hashing and with linear addressing you can avoid REMOVED state
> > markers entirely (so no need to rehash and you'll get empty spots for
> reuse
> > if adding new elements). The key here is the combination of linear
> > addressing AND a good key hash mangling function (theoretically, with a
> > perfect hashing function, the input key distribution wouldn't affect
> linear
> > addressing performance at all -- no clusters or chains effect.
> >
> > 3) Bigrams,
> >
> http://dawidweiss123.appspot.com/run/[email protected]/com.carrotsearch.hppc.caliper.BenchmarkBigramCounting
> >
> > This is a toy example, I know, but it's the only realistic one in the
> set,
> > so it's probably worth mentioning. It is a little bit puzzling to me why
> > HPPC is (by a small margin) better than fastutil because it really
> shouldn't
> > be. This isn't exactly noise (variance is given in plain text attached),
> but
> > I still consider this a glitch within the regular variance.
> >
> > You can try these on your own machine -- I would actually appreciate if
> you
> > sent your results and feedback back to me or this list so that we can see
> > how the above holds consistent betwen architectures, processors and VMs.
> >
> > git clone [email protected]:carrotsearch/hppc.git
> > cd hppc-others
> > (cd lib/; bash ./install-deps )
> > mvn clean package
> > java -server -jar hppc-others-0.4.0-SNAPSHOT/hppc-others*.jar | tee
> > benchmark.log
> >
> > I didn't check the performance of simple lists or other structures with
> > minimal overhead because I don't imagine they being significantly faster
> or
> > slower in any library. For these other facts play a role (I personally
> like
> > the fact I can access the internal buffers in HPPC, but it may against be
> a
> > point against using it ;).
> >
> > Dawid
> >
>
>

Re: Performance of primitive collections

Reply via email to