[
https://issues.apache.org/jira/browse/HIVE-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706832#comment-14706832
]
Yongzhi Chen commented on HIVE-11604:
-------------------------------------
My fix will change query plan a little bit in some cases(especially wrong
results return without the fixes cases).
The failure is because of an extra select operator in the query plan, it should
be no harm: it does correct RS_5's output column
to the right sequence, although it might not be so important in this scenario.
Attach second patch to change the test output, and add more test cases which
succeed in master even without my fix to catch possible regressions in the
future.
In current master:
This query returns wrong results:
{noformat}
select ddd.id, ddd.fkey, aaa.name
from (
select id, fkey,
row_number() over (partition by id, fkey) as rnum
from tlb1 group by id, fkey
) ddd
inner join tlb2 aaa on aaa.fid = ddd.fkey;
{noformat}
while following returns right value(only different is the extra ddd.rnum):
{noformat}
select ddd.id, ddd.fkey, aaa.name, ddd.rnum
from (
select id, fkey,
row_number() over (partition by id, fkey) as rnum
from tlb1 group by id, fkey
) ddd
inner join tlb2 aaa on aaa.fid = ddd.fkey;
{noformat}
> HIVE return wrong results in some queries with PTF function
> -----------------------------------------------------------
>
> Key: HIVE-11604
> URL: https://issues.apache.org/jira/browse/HIVE-11604
> Project: Hive
> Issue Type: Bug
> Components: Logical Optimizer
> Affects Versions: 1.2.0, 1.1.0, 2.0.0
> Reporter: Yongzhi Chen
> Assignee: Yongzhi Chen
> Attachments: HIVE-11604.1.patch, HIVE-11604.2.patch
>
>
> Following query returns empty result which is not right:
> {noformat}
> select ddd.id, ddd.fkey, aaa.name
> from (
> select id, fkey,
> row_number() over (partition by id, fkey) as rnum
> from tlb1 group by id, fkey
> ) ddd
> inner join tlb2 aaa on aaa.fid = ddd.fkey;
> {noformat}
> After remove row_number() over (partition by id, fkey) as rnum from query,
> the right result returns.
> Reproduce:
> {noformat}
> create table tlb1 (id int, fkey int, val string);
> create table tlb2 (fid int, name string);
> insert into table tlb1 values(100,1,'abc');
> insert into table tlb1 values(200,1,'efg');
> insert into table tlb2 values(1, 'key1');
> select ddd.id, ddd.fkey, aaa.name
> from (
> select id, fkey,
> row_number() over (partition by id, fkey) as rnum
> from tlb1 group by id, fkey
> ) ddd
> inner join tlb2 aaa on aaa.fid = ddd.fkey;
> ....
> INFO : Ended Job = job_local1070163923_0017
> +---------+-----------+-----------+--+
> No rows selected (14.248 seconds)
> | ddd.id | ddd.fkey | aaa.name |
> +---------+-----------+-----------+--+
> +---------+-----------+-----------+--+
> 0: jdbc:hive2://localhost:10000> select ddd.id, ddd.fkey, aaa.name
> from (
> select id, fkey
> from tlb1 group by id, fkey
> ) ddd
> inner join tlb2 aaa on aaa.fid = ddd.fkey;select ddd.id, ddd.fkey, aaa.name
> 0: jdbc:hive2://localhost:10000> from (
> 0: jdbc:hive2://localhost:10000> select id, fkey
> 0: jdbc:hive2://localhost:10000> from tlb1 group by id, fkey
> 0: jdbc:hive2://localhost:10000> ) ddd
> 0: jdbc:hive2://localhost:10000>
> inner join tlb2 aaa on aaa.fid = ddd.fkey;
> INFO : Number of reduce tasks not specified. Estimated from input data size:
> 1
> ...
> INFO : Ended Job = job_local672340505_0019
> +---------+-----------+-----------+--+
> 2 rows selected (14.383 seconds)
> | ddd.id | ddd.fkey | aaa.name |
> +---------+-----------+-----------+--+
> | 100 | 1 | key1 |
> | 200 | 1 | key1 |
> +---------+-----------+-----------+--+
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)