[
https://issues.apache.org/jira/browse/HIVE-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15412418#comment-15412418
]
Ashutosh Chauhan commented on HIVE-14442:
-----------------------------------------
+1
> CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong
> result/plan in group by with hive.map.aggr=false
> -------------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-14442
> URL: https://issues.apache.org/jira/browse/HIVE-14442
> Project: Hive
> Issue Type: Sub-task
> Components: CBO
> Reporter: Vineet Garg
> Assignee: Vineet Garg
> Attachments: HIVE-14442.1.patch, HIVE-14442.2.patch,
> HIVE-14442.3.patch
>
>
> Reproducer
> {code} set hive.cbo.returnpath.hiveop=true
> set hive.map.aggr=false
> create table abcd (a int, b int, c int, d int);
> LOAD DATA LOCAL INPATH '../../data/files/in4.txt' INTO TABLE abcd;
> {code}
> {code} explain select count(distinct a) from abcd group by b; {code}
> {code}
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Map Operator Tree:
> TableScan
> alias: abcd
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE
> Column stats: NONE
> Select Operator
> expressions: a (type: int)
> outputColumnNames: a
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE
> Column stats: NONE
> Reduce Output Operator
> key expressions: a (type: int), a (type: int)
> sort order: ++
> Map-reduce partition columns: a (type: int)
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE
> Column stats: NONE
> Reduce Operator Tree:
> Group By Operator
> aggregations: count(DISTINCT KEY._col1:0._col0)
> keys: KEY._col0 (type: int)
> mode: complete
> outputColumnNames: b, $f1
> Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column
> stats: NONE
> Select Operator
> expressions: $f1 (type: bigint)
> outputColumnNames: _o__c0
> Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE
> Column stats: NONE
> File Output Operator
> compressed: false
> Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE
> Column stats: NONE
> table:
> input format:
> org.apache.hadoop.mapred.SequenceFileInputFormat
> output format:
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> {code} explain select count(distinct a) from abcd group by c; {code}
> {code}
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Map Operator Tree:
> TableScan
> alias: abcd
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE
> Column stats: NONE
> Select Operator
> expressions: a (type: int)
> outputColumnNames: a
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE
> Column stats: NONE
> Reduce Output Operator
> key expressions: a (type: int), a (type: int)
> sort order: ++
> Map-reduce partition columns: a (type: int)
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE
> Column stats: NONE
> Reduce Operator Tree:
> Group By Operator
> aggregations: count(DISTINCT KEY._col1:0._col0)
> keys: KEY._col0 (type: int)
> mode: complete
> outputColumnNames: c, $f1
> Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column
> stats: NONE
> Select Operator
> expressions: $f1 (type: bigint)
> outputColumnNames: _o__c0
> Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE
> Column stats: NONE
> File Output Operator
> compressed: false
> Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE
> Column stats: NONE
> table:
> input format:
> org.apache.hadoop.mapred.SequenceFileInputFormat
> output format:
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> Above two cases has wrong keys in Map side Reduce Output Operator (both has
> a, a instead of b,a and c,a respectively
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)