Dawid,

Now I recall why I stopped working on features of Mahout collections :-)
HPPC.

We'll see who gets where first.

--benson


On Fri, Apr 2, 2010 at 10:06 AM, Dawid Weiss <dawid.we...@gmail.com> wrote:

> > What's the use case for needing to vary the hash function? It's one of
> > those things where I assume there are incorrect ways to do it, and
> > correct ways, and among the correct ways fairly clear arguments about
> > which function will be better -- i.e. the object should provide the
> > best function.
>
> Unfortunately this is not true -- just recently I've hit a use case
> where the keys stored were Long values and their distribution had a
> very low variance in the lower bits. HPPC implemented open hashing
> using 2^n arrays and hashes were modulo bitmask... this caused really,
> really long conflict chains for values that were actually very
> different. I looked at how JDK's HashMap solves this problem -- they
> do a simple rehashing scheme internally (so it's object hash and then
> remixing hash in a cascade). I've finally decided to allow external
> hash functions AND changed the _default_ hash function used for
> "remixing" to be murmur hash. Performance benchmarks show this yields
> virtually no degradation in execution time (the CPUs seem to spend
> most of their time waiting on cache misses anyway, so internal
> rehashing is not an issue).
>
> I must also apologize for a bit of inactivity with HPPC... Like I
> said, we have released it internally on our "labs" Web site here:
>
> http://labs.carrotsearch.com/hppc.html
>
> It doesn't mean we turn our backs on contributing HPPC to Mahout --
> the opposite, we would love to do it. But contrary to what I
> originally thought (to push HPPC to Mahout as soon as possible) I kind
> of grew reluctant because so many things are missing (equals/hashcode,
> java collections adapters) or can be improved (documentation, faster
> iterators).
>
> So... I'm still going to experiment with HPPC in our labs, especially
> API-wise, release one or two versions in between and then kindly ask
> you to peek at the final (?) result and consider moving the code under
> Mahout umbrella. Sounds good?
>
> Dawid
>

Reply via email to