[ https://issues.apache.org/jira/browse/HIVE-5149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13750878#comment-13750878 ]
Yin Huai commented on HIVE-5149: -------------------------------- Suppose that we have a parent RS and a child RS. If the child RS can be removed, ReduceSinkDeDuplication always assigns the more specific partitioning columns to the parent RS. For example, if we have "GROUP BY a, b DISTRIBUTE BY a", in the single MR job, the RS uses "a" and "b" as partitioning columns. Seems we need to change ReduceSinkDeDuplication to use the more general partitioning columns. I mean we need to use "a" as the partition column. This change can limit the parallelism of the reduce phase. > ReduceSinkDeDuplication can pick the wrong partitioning columns > --------------------------------------------------------------- > > Key: HIVE-5149 > URL: https://issues.apache.org/jira/browse/HIVE-5149 > Project: Hive > Issue Type: Bug > Reporter: Yin Huai > Assignee: Yin Huai > > https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4qgc+cpar5yvr8sjt...@mail.gmail.com%3E -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira