[
https://issues.apache.org/jira/browse/HIVE-11032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14603481#comment-14603481
]
Mohit Sabharwal commented on HIVE-11032:
----------------------------------------
Thanks [~lirui], yes verified that query plan is in line with what we see in MR.
When {{hive.groupby.skewindata=true}} is set, unless there is a distinct
clause, the Reduce Output Operator partitions based on {{rand()}}. (The
subsequent Reducer then does partial aggregation and the following reducer does
final aggregation.)
I also verified the behavior for other cases as well, for example when
{{hive.map.aggr=true}} is set in addition to {{hive.groupby.skewindata=true}}
as documented here:
https://cwiki.apache.org/confluence/display/Hive/GroupByWithRollup
The {{index_bitmap3}} test failure is unrelated to this patch.
> Enable more tests for grouping by skewed data [Spark Branch]
> ------------------------------------------------------------
>
> Key: HIVE-11032
> URL: https://issues.apache.org/jira/browse/HIVE-11032
> Project: Hive
> Issue Type: Sub-task
> Components: Spark
> Reporter: Rui Li
> Assignee: Mohit Sabharwal
> Priority: Minor
> Attachments: HIVE-11032.1-spark.patch, HIVE-11032.2-spark.patch
>
>
> Not all of such tests are enabled, e.g. {{groupby1_map_skew.q}}. We can use
> this JIRA to track whether we need more of them.
> Basically, we need to look at all tests with {{set
> hive.groupby.skewindata=true;}}.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)