[
https://issues.apache.org/jira/browse/MAPREDUCE-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505673#comment-13505673
]
Robert Joseph Evans commented on MAPREDUCE-4827:
------------------------------------------------
I can see that there may be a need to improve the hashing of some poor quality
implementations and the patch looks OK. I am not an expert on hash functions
but from what I know it looks good. Do you have some concrete numbers that we
can see how it improved the distribution in some specific cases?
> Increase hash quality of HashPartitioner
> ----------------------------------------
>
> Key: MAPREDUCE-4827
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4827
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Radim Kolar
> Attachments: betterhash1.txt
>
>
> hash partitioner is using object.hashCode() for splitting keys into
> partitions. This results in bad distributions because hashCode() quality is
> poor.
> These hashCode() functions are sometimes written by hand (very poor quality)
> and sometimes generated from by commons lang code (poor quality). Applying
> some transformation on top of hashCode() provides better distribution.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira