Dawid, Now I recall why I stopped working on features of Mahout collections :-) HPPC.
We'll see who gets where first. --benson On Fri, Apr 2, 2010 at 10:06 AM, Dawid Weiss <dawid.we...@gmail.com> wrote: > > What's the use case for needing to vary the hash function? It's one of > > those things where I assume there are incorrect ways to do it, and > > correct ways, and among the correct ways fairly clear arguments about > > which function will be better -- i.e. the object should provide the > > best function. > > Unfortunately this is not true -- just recently I've hit a use case > where the keys stored were Long values and their distribution had a > very low variance in the lower bits. HPPC implemented open hashing > using 2^n arrays and hashes were modulo bitmask... this caused really, > really long conflict chains for values that were actually very > different. I looked at how JDK's HashMap solves this problem -- they > do a simple rehashing scheme internally (so it's object hash and then > remixing hash in a cascade). I've finally decided to allow external > hash functions AND changed the _default_ hash function used for > "remixing" to be murmur hash. Performance benchmarks show this yields > virtually no degradation in execution time (the CPUs seem to spend > most of their time waiting on cache misses anyway, so internal > rehashing is not an issue). > > I must also apologize for a bit of inactivity with HPPC... Like I > said, we have released it internally on our "labs" Web site here: > > http://labs.carrotsearch.com/hppc.html > > It doesn't mean we turn our backs on contributing HPPC to Mahout -- > the opposite, we would love to do it. But contrary to what I > originally thought (to push HPPC to Mahout as soon as possible) I kind > of grew reluctant because so many things are missing (equals/hashcode, > java collections adapters) or can be improved (documentation, faster > iterators). > > So... I'm still going to experiment with HPPC in our labs, especially > API-wise, release one or two versions in between and then kindly ask > you to peek at the final (?) result and consider moving the code under > Mahout umbrella. Sounds good? > > Dawid >