[ 
https://issues.apache.org/jira/browse/HIVE-24199?focusedWorklogId=490265&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-490265
 ]

ASF GitHub Bot logged work on HIVE-24199:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 24/Sep/20 15:17
            Start Date: 24/Sep/20 15:17
    Worklog Time Spent: 10m 
      Work Description: kasakrisz opened a new pull request #1525:
URL: https://github.com/apache/hive/pull/1525


   ### What changes were proposed in this pull request?
   Fix incorrect result when subquey in exists contains limit.
   
   ### Why are the changes needed?
   ```
   create table web_sales (ws_order_number int, ws_warehouse_sk int) stored as 
orc;
   
   insert into web_sales values
   (1, 1),
   (1, 2),
   (2, 1),
   (2, 2);
   ```
   Query
   ```
   select * from web_sales ws1
   where exists (select 1 from web_sales ws2 where ws1.ws_order_number = 
ws2.ws_order_number limit 1);
   ```
   should return 4 four rows but returns only 2.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   New q test added.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 490265)
    Remaining Estimate: 0h
            Time Spent: 10m

> Incorrect result when subquey in exists contains limit
> ------------------------------------------------------
>
>                 Key: HIVE-24199
>                 URL: https://issues.apache.org/jira/browse/HIVE-24199
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Krisztian Kasa
>            Assignee: Krisztian Kasa
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code:java}
> create table web_sales (ws_order_number int, ws_warehouse_sk int) stored as 
> orc;
> insert into web_sales values
> (1, 1),
> (1, 2),
> (2, 1),
> (2, 2);
> select * from web_sales ws1
> where exists (select 1 from web_sales ws2 where ws1.ws_order_number = 
> ws2.ws_order_number limit 1);
> 1     1
> 1     2
> {code}
> {code:java}
> CBO PLAN:
> HiveSemiJoin(condition=[=($0, $2)], joinType=[semi])
>   HiveProject(ws_order_number=[$0], ws_warehouse_sk=[$1])
>     HiveFilter(condition=[IS NOT NULL($0)])
>       HiveTableScan(table=[[default, web_sales]], table:alias=[ws1])
>   HiveProject(ws_order_number=[$0])
>     HiveSortLimit(fetch=[1])          <-- This shouldn't be added
>       HiveProject(ws_order_number=[$0])
>         HiveFilter(condition=[IS NOT NULL($0)])
>           HiveTableScan(table=[[default, web_sales]], table:alias=[ws2])
> {code}
> Limit n on the right side of the join reduces the result set coming from the 
> right to only n record hence not all the ws_order_number values are included 
> which leads to correctness issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to