Eric Hanson created HADOOP-10097:
------------------------------------

             Summary: Extend JenkinsHash package interface to allow increased 
code sharing
                 Key: HADOOP-10097
                 URL: https://issues.apache.org/jira/browse/HADOOP-10097
             Project: Hadoop Common
          Issue Type: Improvement
            Reporter: Eric Hanson
            Assignee: Eric Hanson
            Priority: Minor


I copied some code from org.apache.hadoop.util.hash.JenkinsHash and added it to 
org.apache.hadoop.hive.ql.exec.vector.expressions.CuckooSetBytes and modified 
it slightly because the interface was not quite right for use CuckooSetBytes. 

I propose modifying org.apache.hadoop.util.hash.JenkinsHash to provide an 
additional interface function:

public int hash(byte[] key, int start, int nbytes, int initval)

This would return a hash value for the sequence of bytes beginning at start and 
ending at start + nbytes (exclusive).

The existing interface function in org.apache.hadoop.util.hash.JenkinsHash

public int hash(byte[] key, int nbytes, int initval)

would then be modified to call this new function. The original hash() function 
does not take a start parameter, and always assumes the key in byte[] key 
starts at position 0. This will expand the use cases for the JenkinsHash 
package. At that point, the Hive CuckooSetBytes class can be modified so that 
it can reference the JenkinsHash package of Hadoop and use it directly, rather 
than using a copied and modified version of the code locally.

Existing users of hash(byte[] key, int nbytes, int initval) will then have to 
pay an extra function call. If the performance ramifications of this worry 
anyone, please comment. Alternatives would be to copy the new version of hash() 
in entirety into JenkinsHash, or simply not do this JIRA.




--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to