[
https://issues.apache.org/jira/browse/HIVE-11410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648853#comment-14648853
]
Matt McCline commented on HIVE-11410:
-------------------------------------
Right, postgres produces:
{code}
mmccline=# SELECT
t1.c1
FROM
t t1
JOIN
(SELECT
t2.c1,
MAX(t2.c2) AS c2
FROM
t t2
GROUP BY
t2.c1
) t3
ON t1.c2=t3.c2;
c1
-----
abc
(1 row)
{code}
And, Hive branch-1.1 produces the right result:
{code}
SELECT
t1.c1
FROM
t t1
JOIN
(SELECT
t2.c1,
MAX(t2.c2) AS c2
FROM
t t2
GROUP BY
t2.c1
) t3
ON t1.c2=t3.c2;
abc
{code}
Here is the EXPLAIN plan:
{code}
EXPLAIN
SELECT
t1.c1
FROM
t t1
JOIN
(SELECT
t2.c1,
MAX(t2.c2) AS c2
FROM
t t2
GROUP BY
t2.c1
) t3
ON t1.c2=t3.c2;
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-2 depends on stages: Stage-1
Stage-0 depends on stages: Stage-2
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: t2
Statistics: Num rows: 1 Data size: 5 Basic stats: COMPLETE Column
stats: NONE
Select Operator
expressions: c1 (type: string), c2 (type: int)
outputColumnNames: c1, c2
Statistics: Num rows: 1 Data size: 5 Basic stats: COMPLETE Column
stats: NONE
Group By Operator
aggregations: max(c2)
keys: c1 (type: string)
mode: hash
outputColumnNames: _col0, _col1
Statistics: Num rows: 1 Data size: 5 Basic stats: COMPLETE
Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 1 Data size: 5 Basic stats: COMPLETE
Column stats: NONE
value expressions: _col1 (type: int)
Reduce Operator Tree:
Group By Operator
aggregations: max(VALUE._col0)
keys: KEY._col0 (type: string)
mode: mergepartial
outputColumnNames: _col0, _col1
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats:
NONE
Select Operator
expressions: _col1 (type: int)
outputColumnNames: _col1
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column
stats: NONE
Filter Operator
predicate: _col1 is not null (type: boolean)
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column
stats: NONE
Select Operator
expressions: _col1 (type: int)
outputColumnNames: _col1
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column
stats: NONE
File Output Operator
compressed: false
table:
input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde:
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
Stage: Stage-2
Map Reduce
Map Operator Tree:
TableScan
Reduce Output Operator
key expressions: _col1 (type: int)
sort order: +
Map-reduce partition columns: _col1 (type: int)
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column
stats: NONE
TableScan
alias: t1
Statistics: Num rows: 1 Data size: 5 Basic stats: COMPLETE Column
stats: NONE
Filter Operator
predicate: c2 is not null (type: boolean)
Statistics: Num rows: 1 Data size: 5 Basic stats: COMPLETE Column
stats: NONE
Reduce Output Operator
key expressions: c2 (type: int)
sort order: +
Map-reduce partition columns: c2 (type: int)
Statistics: Num rows: 1 Data size: 5 Basic stats: COMPLETE
Column stats: NONE
value expressions: c1 (type: string)
Reduce Operator Tree:
Join Operator
condition map:
Inner Join 0 to 1
keys:
0 c2 (type: int)
1 _col1 (type: int)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 5 Basic stats: COMPLETE Column
stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 5 Basic stats: COMPLETE Column
stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
{code}
So, perhaps different environment variable setting(s) are causing a plan in
branch-1.1 to be produced that the no results case for this JIRA.
> Join with subquery containing a group by incorrectly returns no results
> -----------------------------------------------------------------------
>
> Key: HIVE-11410
> URL: https://issues.apache.org/jira/browse/HIVE-11410
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Affects Versions: 1.1.0
> Reporter: Nicholas Brenwald
> Assignee: Matt McCline
> Priority: Minor
> Attachments: hive-site.xml
>
>
> Start by creating a table *t* with columns *c1* and *c2* and populate with 1
> row of data. For example create table *t* from an existing table which
> contains at least 1 row of data by running:
> {code}
> create table t as select 'abc' as c1, 0 as c2 from Y limit 1;
> {code}
> Table *t* looks like the following:
> ||c1||c2||
> |abc|0|
> Running the following query then returns zero results.
> {code}
> SELECT
> t1.c1
> FROM
> t t1
> JOIN
> (SELECT
> t2.c1,
> MAX(t2.c2) AS c2
> FROM
> t t2
> GROUP BY
> t2.c1
> ) t3
> ON t1.c2=t3.c2
> {code}
> However, we expected to see the following:
> ||c1||
> |abc|
> The problem seems to relate to the fact that in the subquery, we group by
> column *c1*, but this is not subsequently used in the join condition.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)