Re: Performance of primitive collections

Robin Anil Wed, 05 Jun 2013 18:15:11 -0700

Dawid, could you explain how this is accessing the mahout trunk code? Is
that part of the package step?


Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.


On Wed, Jun 5, 2013 at 8:20 PM, Sean Owen <[email protected]> wrote:

> On Wed, Jun 5, 2013 at 7:00 PM, Dawid Weiss
> <[email protected]> wrote:
> > 1) if you know your keys are 0...N-1 (dense) then there is typically no
> > need to use a map; you'll do just as well with a regular lookup array :)
>
> Yeah of course... here I think there is no guarantee of this although
> in many practical use cases the keys are drawn from a small range. For
> example 1000 user IDs are often around the range 1 - 1000 or
> something.
>
>
> > Having written the above it made me wonder why Mahout's performance does
> > not drop significantly on those malicious upper-bit changing benchmarks,
> so
> > I checked the code. The thing is Mahout (and I think Colt, originally)
> uses
> > double hashing with prime-sized tables (not 2^n-1 masking) -- this is
> nice
> > and indeed avoids the problems of clustering mentioned in pt. 2 above.
>
> Yes I thought this was why you could largely get away with it. If the
> table is big enough, collisions and chains should not be a big deal to
> begin with.
>
> Since you have this benchmark readily available -- what happens if you
> un-comment one or both of the lines of code that mixes the hash? That
> would really answer it.
>
> i'm curious as I have certainly made this assumption independently in
> the custom hashtable implementations I made long ago (which also use
> double hashing) and still use in the project. If it's not optimal
> that's important info.
>

Re: Performance of primitive collections

Reply via email to