Krisztian Kasa created HIVE-24199:
-------------------------------------

             Summary: Incorrect result when subquey in exists contains limit
                 Key: HIVE-24199
                 URL: https://issues.apache.org/jira/browse/HIVE-24199
             Project: Hive
          Issue Type: Bug
            Reporter: Krisztian Kasa
            Assignee: Krisztian Kasa


{code:java}
create table web_sales (ws_order_number int, ws_warehouse_sk int) stored as orc;

insert into web_sales values
(1, 1),
(1, 2),
(2, 1),
(2, 2);

select * from web_sales ws1
where exists (select 1 from web_sales ws2 where ws1.ws_order_number = 
ws2.ws_order_number limit 1);
1       1
1       2
{code}
{code:java}
CBO PLAN:
HiveSemiJoin(condition=[=($0, $2)], joinType=[semi])
  HiveProject(ws_order_number=[$0], ws_warehouse_sk=[$1])
    HiveFilter(condition=[IS NOT NULL($0)])
      HiveTableScan(table=[[default, web_sales]], table:alias=[ws1])
  HiveProject(ws_order_number=[$0])
    HiveSortLimit(fetch=[1])          <-- This shouldn't be added
      HiveProject(ws_order_number=[$0])
        HiveFilter(condition=[IS NOT NULL($0)])
          HiveTableScan(table=[[default, web_sales]], table:alias=[ws2])
{code}
Limit n on the right side of the join reduces the result set coming from the 
right to only n record hence not all the ws_order_number values are included 
which leads to correctness issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to