> What's the use case for needing to vary the hash function? It's one of > those things where I assume there are incorrect ways to do it, and > correct ways, and among the correct ways fairly clear arguments about > which function will be better -- i.e. the object should provide the > best function.
Unfortunately this is not true -- just recently I've hit a use case where the keys stored were Long values and their distribution had a very low variance in the lower bits. HPPC implemented open hashing using 2^n arrays and hashes were modulo bitmask... this caused really, really long conflict chains for values that were actually very different. I looked at how JDK's HashMap solves this problem -- they do a simple rehashing scheme internally (so it's object hash and then remixing hash in a cascade). I've finally decided to allow external hash functions AND changed the _default_ hash function used for "remixing" to be murmur hash. Performance benchmarks show this yields virtually no degradation in execution time (the CPUs seem to spend most of their time waiting on cache misses anyway, so internal rehashing is not an issue). I must also apologize for a bit of inactivity with HPPC... Like I said, we have released it internally on our "labs" Web site here: http://labs.carrotsearch.com/hppc.html It doesn't mean we turn our backs on contributing HPPC to Mahout -- the opposite, we would love to do it. But contrary to what I originally thought (to push HPPC to Mahout as soon as possible) I kind of grew reluctant because so many things are missing (equals/hashcode, java collections adapters) or can be improved (documentation, faster iterators). So... I'm still going to experiment with HPPC in our labs, especially API-wise, release one or two versions in between and then kindly ask you to peek at the final (?) result and consider moving the code under Mahout umbrella. Sounds good? Dawid