[ https://issues.apache.org/jira/browse/HIVE-7074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14025996#comment-14025996 ]
Gunther Hagleitner commented on HIVE-7074: ------------------------------------------ I believe that's unnecessary given HIVE-7121. Once we do a decent job of hashing the keys the prime number reducer requirement goes away. > The reducer parallelism should be a prime number for better stride protection > ----------------------------------------------------------------------------- > > Key: HIVE-7074 > URL: https://issues.apache.org/jira/browse/HIVE-7074 > Project: Hive > Issue Type: Improvement > Components: Statistics > Reporter: Gopal V > Assignee: Gopal V > Attachments: HIVE-7074.1.patch > > > The current hive reducer parallelism results in stride issues with key > distribution. > a JOIN generating even numbers will get strided onto only some of the > reducers. > The probability of distribution skew is controlled by the number of common > factors shared by the hashcode of the key and the number of buckets. > Using a prime number within the reducer estimation will cut that probability > down by a significant amount. -- This message was sent by Atlassian JIRA (v6.2#6252)