Krisztian Kasa created HIVE-24199: ------------------------------------- Summary: Incorrect result when subquey in exists contains limit Key: HIVE-24199 URL: https://issues.apache.org/jira/browse/HIVE-24199 Project: Hive Issue Type: Bug Reporter: Krisztian Kasa Assignee: Krisztian Kasa
{code:java} create table web_sales (ws_order_number int, ws_warehouse_sk int) stored as orc; insert into web_sales values (1, 1), (1, 2), (2, 1), (2, 2); select * from web_sales ws1 where exists (select 1 from web_sales ws2 where ws1.ws_order_number = ws2.ws_order_number limit 1); 1 1 1 2 {code} {code:java} CBO PLAN: HiveSemiJoin(condition=[=($0, $2)], joinType=[semi]) HiveProject(ws_order_number=[$0], ws_warehouse_sk=[$1]) HiveFilter(condition=[IS NOT NULL($0)]) HiveTableScan(table=[[default, web_sales]], table:alias=[ws1]) HiveProject(ws_order_number=[$0]) HiveSortLimit(fetch=[1]) <-- This shouldn't be added HiveProject(ws_order_number=[$0]) HiveFilter(condition=[IS NOT NULL($0)]) HiveTableScan(table=[[default, web_sales]], table:alias=[ws2]) {code} Limit n on the right side of the join reduces the result set coming from the right to only n record hence not all the ws_order_number values are included which leads to correctness issue. -- This message was sent by Atlassian Jira (v8.3.4#803005)