Dawid Weiss created MAHOUT-1242:
-----------------------------------
Summary: No key redistribution function for associative maps
Key: MAHOUT-1242
URL: https://issues.apache.org/jira/browse/MAHOUT-1242
Project: Mahout
Issue Type: Improvement
Components: collections, Math
Reporter: Dawid Weiss
Assignee: Benson Margulies
All integer-based maps currently use HashFunctions.hash(int) which just returns
the key value:
{code}
/**
* Returns a hashcode for the specified value.
*
* @return a hash code value for the specified value.
*/
public static int hash(int value) {
return value;
//return value * 0x278DDE6D; // see
org.apache.mahout.math.jet.random.engine.DRand
/*
value &= 0x7FFFFFFF; // make it >=0
int hashCode = 0;
do hashCode = 31*hashCode + value%10;
while ((value /= 10) > 0);
return 28629151*hashCode; // spread even further; h*31^5
*/
}
{code}
This easily leads to very degenerate behavior on keys that have constant lower
bits (long collision chains). A simple (and strong) hash function like the
final step of murmurhash3 goes a long way at ensuring the keys distribution is
more uniform regardless of the input distribution.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira