[ 
https://issues.apache.org/jira/browse/HIVE-12566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15037307#comment-15037307
 ] 

Chaoyu Tang commented on HIVE-12566:
------------------------------------

I think that pushing down of where clause to join condition should not be 
applicable to any outer join. Otherwise, the initially matched rows from two 
joined tables might no longer match with more join conditions, and an extra row 
will result in some outer join cases. 
For an example, in above provided case, the ltable row 
(1,NULL,CD5415192314304,00071) could join with rtable row 
(1,CD5415192314304,00071) on condition (l.lk1 = r.rk1 AND l.lk2 = r.rk2), and 
generates a joined row (1,NULL,CD5415192314304,00071, 1,CD5415192314304,00071), 
but which is filtered out in the final result due to not meeting the where 
clause.
But if the where clause is pushed down to the join condition, the rtable row 
(1,CD5415192314304,00071) does no longer match ltable row 
(1,NULL,CD5415192314304,00071) on the new stricter condition
(l.lk1 = r.rk1 AND l.lk2 = r.rk2 and 
COALESCE(l.la,'EMPTY')=COALESCE(r.ra,'EMPTY')), therefore the resulted left 
joined row will be (1,NULL,CD5415192314304,00071, NULL, NULL, NULL). Since this 
row meets the where clause, it won't be filtered out and thus an extra row is 
resulted.

> Incorrect result returns when using COALESCE in WHERE condition with LEFT JOIN
> ------------------------------------------------------------------------------
>
>                 Key: HIVE-12566
>                 URL: https://issues.apache.org/jira/browse/HIVE-12566
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Planning
>    Affects Versions: 0.13.0
>            Reporter: Chaoyu Tang
>            Assignee: Chaoyu Tang
>            Priority: Critical
>
> The left join query with on/where clause returns incorrect result (more rows 
> are returned). See the reproducible sample below.
> Left table with data:
> {code}
> CREATE TABLE ltable (i int, la int, lk1 string, lk2 string) ROW FORMAT 
> DELIMITED FIELDS TERMINATED BY ',';
> ---
> 1,\N,CD5415192314304,00071
> 2,\N,CD5415192225530,00071
> {code}
> Right  table with data:
> {code}
> CREATE TABLE rtable (ra int, rk1 string, rk2 string) ROW FORMAT DELIMITED 
> FIELDS TERMINATED BY ',';
> ---
> 1,CD5415192314304,00071
> 45,CD5415192314304,00072
> {code}
> Query:
> {code}
> SELECT * FROM ltable l LEFT OUTER JOIN rtable r on (l.lk1 = r.rk1 AND l.lk2 = 
> r.rk2) WHERE COALESCE(l.la,'EMPTY')=COALESCE(r.ra,'EMPTY');
> {code}
> Result returns:
> {code}
> 1     NULL    CD5415192314304 00071   NULL    NULL    NULL
> 2     NULL    CD5415192225530 00071   NULL    NULL    NULL
> {code}
> The correct result should be
> {code}
> 2     NULL    CD5415192225530 00071   NULL    NULL    NULL
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to