[ 
https://issues.apache.org/jira/browse/HADOOP-9088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504919#comment-13504919
 ] 

Radim Kolar commented on HADOOP-9088:
-------------------------------------

Hash partitioner can do same thing like java hashtable - rehashing hashCode() 
to get better distribution. current hash quality in hadoop is low if you have 
lot of similar strings like "aaaaaaaa" "aaaaaab" you will get about 20% 
unoptimal partitions in average cases. but in some specific cases it can split 
like 80:20 istead of close 50:50
                
> Add Murmur3 hash
> ----------------
>
>                 Key: HADOOP-9088
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9088
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Radim Kolar
>            Assignee: Radim Kolar
>         Attachments: murmur3-2.txt, murmur3-3.txt, murmur3.txt
>
>
> faster and better then murmur2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to