[ https://issues.apache.org/jira/browse/HIVE-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16070235#comment-16070235 ]
Bing Li edited comment on HIVE-16659 at 6/30/17 3:11 PM: --------------------------------------------------------- This patch is based on branch-2.3. With the above changes, I could get the explain result as below. hive> {color:red}set hive.spark.use.groupby.shuffle=true;{color} hive> explain select key, count(val) from t1 group by key; OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Spark Edges: {color:red}Reducer 2 <- Map 1 (GROUP, 2){color} DagName: root_20170630080539_565b5a00-822e-46e9-a146-be84723ae7f6:2 Vertices: Map 1 Map Operator Tree: TableScan alias: t1 Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: key (type: int), val (type: string) outputColumnNames: key, val Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: count(val) keys: key (type: int) mode: hash outputColumnNames: _col0, _col1 Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE value expressions: _col1 (type: bigint) Reducer 2 Reduce Operator Tree: Group By Operator aggregations: count(VALUE._col0) keys: KEY._col0 (type: int) mode: mergepartial outputColumnNames: _col0, _col1 Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink Time taken: 51.289 seconds, Fetched: 54 row(s) hive> {color:red}set hive.spark.use.groupby.shuffle=false;{color} hive> explain select key, count(val) from t1 group by key; OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Spark Edges: {color:red}Reducer 2 <- Map 1 (GROUP PARTITION-LEVEL SORT, 2){color} DagName: root_20170630075518_b84add65-57db-466f-9521-3f1b14de6826:1 Vertices: Map 1 Map Operator Tree: TableScan alias: t1 Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: key (type: int), val (type: string) outputColumnNames: key, val Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: count(val) keys: key (type: int) mode: hash outputColumnNames: _col0, _col1 Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE value expressions: _col1 (type: bigint) Reducer 2 Reduce Operator Tree: Group By Operator aggregations: count(VALUE._col0) keys: KEY._col0 (type: int) mode: mergepartial outputColumnNames: _col0, _col1 Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink Time taken: 49.372 seconds, Fetched: 54 row(s) was (Author: libing): This patch is based on branch-2.3. With the above changes, I could get the explain result as below. hive> {color:#d04437}set hive.spark.use.groupby.shuffle=true;{color} hive> explain select key, count(val) from t1 group by key;{color:#d04437}colored text{color} OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Spark Edges: {color:red}Reducer 2 <- Map 1 (GROUP, 2){color} DagName: root_20170630080539_565b5a00-822e-46e9-a146-be84723ae7f6:2 Vertices: Map 1 Map Operator Tree: TableScan alias: t1 Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: key (type: int), val (type: string) outputColumnNames: key, val Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: count(val) keys: key (type: int) mode: hash outputColumnNames: _col0, _col1 Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE value expressions: _col1 (type: bigint) Reducer 2 Reduce Operator Tree: Group By Operator aggregations: count(VALUE._col0) keys: KEY._col0 (type: int) mode: mergepartial outputColumnNames: _col0, _col1 Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink Time taken: 51.289 seconds, Fetched: 54 row(s) hive> {color:#d04437}set hive.spark.use.groupby.shuffle=false;{color} hive> explain select key, count(val) from t1 group by key; OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Spark Edges: {color:#d04437}Reducer 2 <- Map 1 (GROUP PARTITION-LEVEL SORT, 2){color} DagName: root_20170630075518_b84add65-57db-466f-9521-3f1b14de6826:1 Vertices: Map 1 Map Operator Tree: TableScan alias: t1 Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: key (type: int), val (type: string) outputColumnNames: key, val Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: count(val) keys: key (type: int) mode: hash outputColumnNames: _col0, _col1 Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE value expressions: _col1 (type: bigint) Reducer 2 Reduce Operator Tree: Group By Operator aggregations: count(VALUE._col0) keys: KEY._col0 (type: int) mode: mergepartial outputColumnNames: _col0, _col1 Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink Time taken: 49.372 seconds, Fetched: 54 row(s) > Query plan should reflect hive.spark.use.groupby.shuffle > -------------------------------------------------------- > > Key: HIVE-16659 > URL: https://issues.apache.org/jira/browse/HIVE-16659 > Project: Hive > Issue Type: Bug > Components: Spark > Reporter: Rui Li > Assignee: Bing Li > Attachments: HIVE-16659.1.patch > > > It's useful to show the shuffle type used in the query plan. Currently it > shows "GROUP" no matter what we set for hive.spark.use.groupby.shuffle. -- This message was sent by Atlassian JIRA (v6.4.14#64029)