anishek created HIVE-21539:
------------------------------
Summary: GroupBy + where clause on same column results in
incorrect query rewrite
Key: HIVE-21539
URL: https://issues.apache.org/jira/browse/HIVE-21539
Project: Hive
Issue Type: Bug
Components: HiveServer2
Affects Versions: 4.0.0
Reporter: anishek
{code}
create table a (i int, j string);
insert into a values ( 1, 'a'),(2,'b');
explain extended select min(j) from a where j='a' group by j;
+----------------------------------------------------+
| Explain |
+----------------------------------------------------+
| OPTIMIZED SQL: SELECT MIN(TRUE) AS `_o__c0` |
| FROM `default`.`a` |
| WHERE `j` = 'a' |
| GROUP BY TRUE |
| STAGE DEPENDENCIES: |
| Stage-1 is a root stage |
| Stage-0 depends on stages: Stage-1 |
| |
| STAGE PLANS: |
| Stage: Stage-1 |
| Tez |
| DagId: anagarwal_20190318153535_25c1f460-1986-475e-9995-9f6342029dd8:11
|
| Edges: |
| Reducer 2 <- Map 1 (SIMPLE_EDGE) |
| DagName:
anagarwal_20190318153535_25c1f460-1986-475e-9995-9f6342029dd8:11 |
| Vertices: |
| Map 1 |
| Map Operator Tree: |
| TableScan |
| alias: a |
| filterExpr: (j = 'a') (type: boolean) |
| Statistics: Num rows: 2 Data size: 170 Basic stats:
COMPLETE Column stats: COMPLETE |
| GatherStats: false |
| Filter Operator |
| isSamplingPred: false |
| predicate: (j = 'a') (type: boolean) |
| Statistics: Num rows: 1 Data size: 85 Basic stats:
COMPLETE Column stats: COMPLETE |
| Select Operator |
| Statistics: Num rows: 1 Data size: 85 Basic stats:
COMPLETE Column stats: COMPLETE |
| Group By Operator |
| aggregations: min(true) |
| keys: true (type: boolean) |
| mode: hash |
| outputColumnNames: _col0, _col1 |
| Statistics: Num rows: 1 Data size: 8 Basic stats:
COMPLETE Column stats: COMPLETE |
| Reduce Output Operator |
| key expressions: _col0 (type: boolean) |
| null sort order: a |
| sort order: + |
| Map-reduce partition columns: _col0 (type: boolean)
|
| Statistics: Num rows: 1 Data size: 8 Basic stats:
COMPLETE Column stats: COMPLETE |
| tag: -1 |
| value expressions: _col1 (type: boolean) |
| auto parallelism: true |
| Path -> Alias: |
| hdfs://localhost:9000/tmp/hive/warehouse/a [a] |
| Path -> Partition: |
| hdfs://localhost:9000/tmp/hive/warehouse/a |
| Partition |
| base file name: a |
| input format: org.apache.hadoop.mapred.TextInputFormat |
| output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
| properties: |
| COLUMN_STATS_ACCURATE
{"BASIC_STATS":"true","COLUMN_STATS":{"i":"true","j":"true"}} |
| bucket_count -1 |
| bucketing_version 2 |
| column.name.delimiter , |
| columns i,j |
| columns.comments |
| columns.types int:string |
| file.inputformat org.apache.hadoop.mapred.TextInputFormat
|
| file.outputformat
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
| location hdfs://localhost:9000/tmp/hive/warehouse/a |
| name default.a |
| numFiles 3 |
| numRows 2 |
| rawDataSize 6 |
| serialization.ddl struct a { i32 i, string j} |
| serialization.format 1 |
| serialization.lib
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| totalSize 16 |
| transient_lastDdlTime 1552903148 |
| serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| |
| input format: org.apache.hadoop.mapred.TextInputFormat |
| output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
| properties: |
| COLUMN_STATS_ACCURATE
{"BASIC_STATS":"true","COLUMN_STATS":{"i":"true","j":"true"}} |
| bucket_count -1 |
| bucketing_version 2 |
| column.name.delimiter , |
| columns i,j |
| columns.comments |
| columns.types int:string |
| file.inputformat
org.apache.hadoop.mapred.TextInputFormat |
| file.outputformat
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
| location hdfs://localhost:9000/tmp/hive/warehouse/a |
| name default.a |
| numFiles 3 |
| numRows 2 |
| rawDataSize 6 |
| serialization.ddl struct a { i32 i, string j} |
| serialization.format 1 |
| serialization.lib
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| totalSize 16 |
| transient_lastDdlTime 1552903148 |
| serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
|
| name: default.a |
| name: default.a |
| Truncated Path -> Alias: |
+----------------------------------------------------+
| Explain |
+----------------------------------------------------+
| /a [a] |
| Reducer 2 |
| Needs Tagging: false |
| Reduce Operator Tree: |
| Group By Operator |
| aggregations: min(VALUE._col0) |
| keys: KEY._col0 (type: boolean) |
| mode: mergepartial |
| outputColumnNames: _col0, _col1 |
| Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE
Column stats: COMPLETE |
| Select Operator |
| expressions: _col1 (type: boolean) |
| outputColumnNames: _col0 |
| Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE
Column stats: COMPLETE |
| File Output Operator |
| compressed: false |
| GlobalTableId: 0 |
| directory:
hdfs://localhost:9000/tmp/hive/anagarwal/20f7b890-606b-4815-a56e-ab3384ef58f5/hive_2019-03-18_15-35-35_644_3057456177912469405-1/-mr-10001/.hive-staging_hive_2019-03-18_15-35-35_644_3057456177912469405-1/-ext-10002
|
| NumFilesPerFileSink: 1 |
| Statistics: Num rows: 1 Data size: 4 Basic stats:
COMPLETE Column stats: COMPLETE |
| Stats Publishing Key Prefix:
hdfs://localhost:9000/tmp/hive/anagarwal/20f7b890-606b-4815-a56e-ab3384ef58f5/hive_2019-03-18_15-35-35_644_3057456177912469405-1/-mr-10001/.hive-staging_hive_2019-03-18_15-35-35_644_3057456177912469405-1/-ext-10002/
|
| table: |
| input format:
org.apache.hadoop.mapred.SequenceFileInputFormat |
| output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat |
| properties: |
| columns _col0 |
| columns.types boolean |
| escape.delim \ |
| hive.serialization.extend.additional.nesting.levels
true |
| serialization.escape.crlf true |
| serialization.format 1 |
| serialization.lib
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| TotalFiles: 1 |
| GatherStats: false |
| MultiFileSpray: false |
| |
| Stage: Stage-0 |
| Fetch Operator |
| limit: -1 |
| Processor Tree: |
| ListSink |
| |
+----------------------------------------------------+
{code}
query is rewritten with *true* as the column value.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)