[ https://issues.apache.org/jira/browse/HIVE-15357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15745110#comment-15745110 ]
Rui Li commented on HIVE-15357: ------------------------------- When I tried to re-enable these tests, I found this issue with DPP. Running the following query in our qtest: {code} EXPLAIN select count(*) from srcpart join (select ds as ds, ds as `date` from srcpart group by ds) s on (srcpart.ds = s.ds) where s.`date` = '2008-04-08'; {code} will get this following plan: {noformat} STAGE DEPENDENCIES: Stage-2 is a root stage Stage-1 depends on stages: Stage-2 Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-2 Spark Edges: Reducer 7 <- Map 6 (GROUP, 1) DagName: lirui_20161213205144_16c810ef-52f6-40c9-a346-7dbd2c9ef99e:2 Vertices: Map 6 Map Operator Tree: TableScan alias: srcpart Statistics: Num rows: 1000 Data size: 10624 Basic stats: COMPLETE Column stats: NONE Select Operator Statistics: Num rows: 1000 Data size: 10624 Basic stats: COMPLETE Column stats: NONE Group By Operator keys: '2008-04-08' (type: string) mode: hash outputColumnNames: _col0 Statistics: Num rows: 1000 Data size: 10624 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: '2008-04-08' (type: string) sort order: + Map-reduce partition columns: '2008-04-08' (type: string) Statistics: Num rows: 1000 Data size: 10624 Basic stats: COMPLETE Column stats: NONE Map 8 Map Operator Tree: TableScan alias: srcpart Statistics: Num rows: 2000 Data size: 21248 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: ds (type: string) outputColumnNames: _col0 Statistics: Num rows: 2000 Data size: 21248 Basic stats: COMPLETE Column stats: NONE Group By Operator keys: _col0 (type: string) mode: hash outputColumnNames: _col0 Statistics: Num rows: 2000 Data size: 21248 Basic stats: COMPLETE Column stats: NONE Spark Partition Pruning Sink Operator partition key expr: ds Statistics: Num rows: 2000 Data size: 21248 Basic stats: COMPLETE Column stats: NONE target column name: ds target work: Map 1 Reducer 7 Reduce Operator Tree: Group By Operator keys: '2008-04-08' (type: string) mode: mergepartial outputColumnNames: _col0 Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE Column stats: NONE Select Operator Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: '2008-04-08' (type: string) outputColumnNames: _col0 Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE Column stats: NONE Group By Operator keys: _col0 (type: string) mode: hash outputColumnNames: _col0 Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE Column stats: NONE Spark Partition Pruning Sink Operator partition key expr: ds Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE Column stats: NONE target column name: ds target work: Map 5 Stage: Stage-1 Spark Edges: Reducer 2 <- Map 1 (GROUP, 1) Reducer 3 <- Map 5 (PARTITION-LEVEL SORT, 1), Reducer 2 (PARTITION-LEVEL SORT, 1) Reducer 4 <- Reducer 3 (GROUP, 1) DagName: lirui_20161213205144_16c810ef-52f6-40c9-a346-7dbd2c9ef99e:1 Vertices: Map 1 Map Operator Tree: TableScan alias: srcpart Statistics: Num rows: 1000 Data size: 10624 Basic stats: COMPLETE Column stats: NONE Select Operator Statistics: Num rows: 1000 Data size: 10624 Basic stats: COMPLETE Column stats: NONE Group By Operator keys: '2008-04-08' (type: string) mode: hash outputColumnNames: _col0 Statistics: Num rows: 1000 Data size: 10624 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: '2008-04-08' (type: string) sort order: + Map-reduce partition columns: '2008-04-08' (type: string) Statistics: Num rows: 1000 Data size: 10624 Basic stats: COMPLETE Column stats: NONE Map 5 Map Operator Tree: TableScan alias: srcpart Statistics: Num rows: 2000 Data size: 21248 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: ds (type: string) sort order: + Map-reduce partition columns: ds (type: string) Statistics: Num rows: 2000 Data size: 21248 Basic stats: COMPLETE Column stats: NONE Reducer 2 Reduce Operator Tree: Group By Operator keys: '2008-04-08' (type: string) mode: mergepartial outputColumnNames: _col0 Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE Column stats: NONE Select Operator Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: '2008-04-08' (type: string) sort order: + Map-reduce partition columns: '2008-04-08' (type: string) Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE Column stats: NONE Reducer 3 Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 keys: 0 ds (type: string) 1 _col0 (type: string) Statistics: Num rows: 2200 Data size: 23372 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: count() mode: hash outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: bigint) Reducer 4 Reduce Operator Tree: Group By Operator aggregations: count(VALUE._col0) mode: mergepartial outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink {noformat} There's an extra MapWork8 in stage-2. The reason is because the joining key is partition key for both tables. So we created two DPP operators for the query. Tez doesn't have this problem because it has {{TezCompiler::runCycleAnalysisForPartitionPruning}} to remove such cyclic DPP dependency. I think we need to to the same, but not sure why we didn't see the issue when we implemented the feature. I'll find more about it. [~csun], [~xuefuz] any ideas? > Fix and re-enable the spark-only tests > -------------------------------------- > > Key: HIVE-15357 > URL: https://issues.apache.org/jira/browse/HIVE-15357 > Project: Hive > Issue Type: Test > Reporter: Rui Li > Assignee: Rui Li > > Defined by {{spark.only.query.files}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)