[
https://issues.apache.org/jira/browse/HIVE-20570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619459#comment-16619459
]
Janaki Lahorani commented on HIVE-20570:
----------------------------------------
The test failure is not related to this patch.
> Union ALL with hive.optimize.union.remove=true has incorrect plan
> -----------------------------------------------------------------
>
> Key: HIVE-20570
> URL: https://issues.apache.org/jira/browse/HIVE-20570
> Project: Hive
> Issue Type: Bug
> Reporter: Janaki Lahorani
> Assignee: Janaki Lahorani
> Priority: Major
> Attachments: HIVE-20570.1.patch, HIVE-20570.2.patch
>
>
> When hive.optimize.union.remove=true and a select query is run with group by,
> the final fetch is waiting only for one of the branches and not both.
> Test Case:
> {code}
> create table if not exists test_table(column1 string, column2 int);
> insert into test_table values('a',1),('b',2);
> set hive.optimize.union.remove=true;
> set mapred.input.dir.recursive=true;
> explain
> select column1 from test_table group by column1
> union all
> select column1 from test_table group by column1;
> {code}
> In the below the two stages correspond to the two parts of union all. But
> the final fetch operator (Stage 0) only depends on one of the stages, but it
> should depend on both.
> Plan:
> {code}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-2 is a root stage
> *Stage-0 depends on stages: Stage-1*
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Map Operator Tree:
> TableScan
> alias: test_table
> Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE Column
> stats: NONE
> Select Operator
> expressions: column1 (type: string)
> outputColumnNames: column1
> Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE
> Column stats: NONE
> Group By Operator
> keys: column1 (type: string)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE
> Column stats: NONE
> Reduce Output Operator
> key expressions: _col0 (type: string)
> sort order: +
> Map-reduce partition columns: _col0 (type: string)
> Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE
> Column stats: NONE
> Execution mode: vectorized
> Reduce Operator Tree:
> Group By Operator
> keys: KEY._col0 (type: string)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column
> stats: NONE
> File Output Operator
> compressed: false
> Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column
> stats: NONE
> table:
> input format: org.apache.hadoop.mapred.SequenceFileInputFormat
> output format:
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Stage: Stage-2
> Map Reduce
> Map Operator Tree:
> TableScan
> alias: test_table
> Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE Column
> stats: NONE
> Select Operator
> expressions: column1 (type: string)
> outputColumnNames: column1
> Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE
> Column stats: NONE
> Group By Operator
> keys: column1 (type: string)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE
> Column stats: NONE
> Reduce Output Operator
> key expressions: _col0 (type: string)
> sort order: +
> Map-reduce partition columns: _col0 (type: string)
> Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE
> Column stats: NONE
> Execution mode: vectorized
> Reduce Operator Tree:
> Group By Operator
> keys: KEY._col0 (type: string)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column
> stats: NONE
> File Output Operator
> compressed: false
> Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column
> stats: NONE
> table:
> input format: org.apache.hadoop.mapred.SequenceFileInputFormat
> output format:
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Stage: Stage-0
> Fetch Operator
> limit: -1
> Processor Tree:
> ListSink
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)