Dawid Weiss created MAHOUT-1242:
-----------------------------------

             Summary: No key redistribution function for associative maps
                 Key: MAHOUT-1242
                 URL: https://issues.apache.org/jira/browse/MAHOUT-1242
             Project: Mahout
          Issue Type: Improvement
          Components: collections, Math
            Reporter: Dawid Weiss
            Assignee: Benson Margulies


All integer-based maps currently use HashFunctions.hash(int) which just returns 
the key value:
{code}
  /**
   * Returns a hashcode for the specified value.
   *
   * @return a hash code value for the specified value.
   */
  public static int hash(int value) {
    return value;

    //return value * 0x278DDE6D; // see 
org.apache.mahout.math.jet.random.engine.DRand

    /*
    value &= 0x7FFFFFFF; // make it >=0
    int hashCode = 0;
    do hashCode = 31*hashCode + value%10;
    while ((value /= 10) > 0);

    return 28629151*hashCode; // spread even further; h*31^5
    */
  }
 {code}

This easily leads to very degenerate behavior on keys that have constant lower 
bits (long collision chains). A simple (and strong) hash function like the 
final step of murmurhash3 goes a long way at ensuring the keys distribution is 
more uniform regardless of the input distribution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to