[
https://issues.apache.org/jira/browse/HIVE-11410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14649184#comment-14649184
]
Nicholas Brenwald commented on HIVE-11410:
------------------------------------------
Hi,
Thanks for taking a look at this so quickly.
I confirm we are using branch-1.1 (distributed as part of CDH 5.4.4). For
example, hive cli jar is named hive-cli-1.1.0-cdh5.4.4.jar. When we run 'hive'
on the command line, we see the following printed message showing the
hive-common-1.1.0 is being used.
{code}
Logging initialized using configuration in
jar:file:/cloudera/parcel-repo/CDH-5.4.4-1.cdh5.4.4.p0.4/jars/hive-common-1.1.0-cdh5.4.4.jar!/hive-log4j.properties
{code}
And the explain plan we see is as follows:
{code}
hive> EXPLAIN
> SELECT
> t1.c1
> FROM
> t t1
> JOIN
> (SELECT
> t2.c1,
> MAX(t2.c2) AS c2
> FROM
> t t2
> GROUP BY
> t2.c1
> ) t3
> ON t1.c2=t3.c2;
OK
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-5 depends on stages: Stage-1
Stage-4 depends on stages: Stage-5
Stage-0 depends on stages: Stage-4
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: t2
Statistics: Num rows: 1 Data size: 5 Basic stats: COMPLETE Column
stats: NONE
Select Operator
expressions: c1 (type: string), c2 (type: int)
outputColumnNames: c1, c2
Statistics: Num rows: 1 Data size: 5 Basic stats: COMPLETE Column
stats: NONE
Group By Operator
aggregations: max(c2)
keys: c1 (type: string)
mode: hash
outputColumnNames: _col0, _col1
Statistics: Num rows: 1 Data size: 5 Basic stats: COMPLETE
Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 1 Data size: 5 Basic stats: COMPLETE
Column stats: NONE
value expressions: _col1 (type: int)
Reduce Operator Tree:
Group By Operator
aggregations: max(VALUE._col0)
keys: KEY._col0 (type: string)
mode: mergepartial
outputColumnNames: _col0, _col1
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats:
NONE
Filter Operator
predicate: _col1 is not null (type: boolean)
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column
stats: NONE
File Output Operator
compressed: false
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde:
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
Stage: Stage-5
Map Reduce Local Work
Alias -> Map Local Tables:
t1
Fetch Operator
limit: -1
Alias -> Map Local Operator Tree:
t1
TableScan
alias: t1
filterExpr: c2 is not null (type: boolean)
Statistics: Num rows: 1 Data size: 5 Basic stats: COMPLETE Column
stats: NONE
Filter Operator
predicate: c2 is not null (type: boolean)
Statistics: Num rows: 1 Data size: 5 Basic stats: COMPLETE Column
stats: NONE
HashTable Sink Operator
keys:
0 c2 (type: int)
1 _col1 (type: int)
Stage: Stage-4
Map Reduce
Map Operator Tree:
TableScan
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 c2 (type: int)
1 _col1 (type: int)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 5 Basic stats: COMPLETE Column
stats: NONE
Select Operator
expressions: _col0 (type: string)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 5 Basic stats: COMPLETE
Column stats: NONE
File Output Operator
compressed: true
Statistics: Num rows: 1 Data size: 5 Basic stats: COMPLETE
Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Local Work:
Map Reduce Local Work
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
{code}
> Join with subquery containing a group by incorrectly returns no results
> -----------------------------------------------------------------------
>
> Key: HIVE-11410
> URL: https://issues.apache.org/jira/browse/HIVE-11410
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Affects Versions: 1.1.0
> Reporter: Nicholas Brenwald
> Assignee: Matt McCline
> Priority: Minor
> Attachments: hive-site.xml
>
>
> Start by creating a table *t* with columns *c1* and *c2* and populate with 1
> row of data. For example create table *t* from an existing table which
> contains at least 1 row of data by running:
> {code}
> create table t as select 'abc' as c1, 0 as c2 from Y limit 1;
> {code}
> Table *t* looks like the following:
> ||c1||c2||
> |abc|0|
> Running the following query then returns zero results.
> {code}
> SELECT
> t1.c1
> FROM
> t t1
> JOIN
> (SELECT
> t2.c1,
> MAX(t2.c2) AS c2
> FROM
> t t2
> GROUP BY
> t2.c1
> ) t3
> ON t1.c2=t3.c2
> {code}
> However, we expected to see the following:
> ||c1||
> |abc|
> The problem seems to relate to the fact that in the subquery, we group by
> column *c1*, but this is not subsequently used in the join condition.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)