[
https://issues.apache.org/jira/browse/HADOOP-9088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504919#comment-13504919
]
Radim Kolar commented on HADOOP-9088:
-------------------------------------
Hash partitioner can do same thing like java hashtable - rehashing hashCode()
to get better distribution. current hash quality in hadoop is low if you have
lot of similar strings like "aaaaaaaa" "aaaaaab" you will get about 20%
unoptimal partitions in average cases. but in some specific cases it can split
like 80:20 istead of close 50:50
> Add Murmur3 hash
> ----------------
>
> Key: HADOOP-9088
> URL: https://issues.apache.org/jira/browse/HADOOP-9088
> Project: Hadoop Common
> Issue Type: Improvement
> Reporter: Radim Kolar
> Assignee: Radim Kolar
> Attachments: murmur3-2.txt, murmur3-3.txt, murmur3.txt
>
>
> faster and better then murmur2
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira