[
https://issues.apache.org/jira/browse/HIVE-12566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15037307#comment-15037307
]
Chaoyu Tang commented on HIVE-12566:
------------------------------------
I think that pushing down of where clause to join condition should not be
applicable to any outer join. Otherwise, the initially matched rows from two
joined tables might no longer match with more join conditions, and an extra row
will result in some outer join cases.
For an example, in above provided case, the ltable row
(1,NULL,CD5415192314304,00071) could join with rtable row
(1,CD5415192314304,00071) on condition (l.lk1 = r.rk1 AND l.lk2 = r.rk2), and
generates a joined row (1,NULL,CD5415192314304,00071, 1,CD5415192314304,00071),
but which is filtered out in the final result due to not meeting the where
clause.
But if the where clause is pushed down to the join condition, the rtable row
(1,CD5415192314304,00071) does no longer match ltable row
(1,NULL,CD5415192314304,00071) on the new stricter condition
(l.lk1 = r.rk1 AND l.lk2 = r.rk2 and
COALESCE(l.la,'EMPTY')=COALESCE(r.ra,'EMPTY')), therefore the resulted left
joined row will be (1,NULL,CD5415192314304,00071, NULL, NULL, NULL). Since this
row meets the where clause, it won't be filtered out and thus an extra row is
resulted.
> Incorrect result returns when using COALESCE in WHERE condition with LEFT JOIN
> ------------------------------------------------------------------------------
>
> Key: HIVE-12566
> URL: https://issues.apache.org/jira/browse/HIVE-12566
> Project: Hive
> Issue Type: Bug
> Components: Query Planning
> Affects Versions: 0.13.0
> Reporter: Chaoyu Tang
> Assignee: Chaoyu Tang
> Priority: Critical
>
> The left join query with on/where clause returns incorrect result (more rows
> are returned). See the reproducible sample below.
> Left table with data:
> {code}
> CREATE TABLE ltable (i int, la int, lk1 string, lk2 string) ROW FORMAT
> DELIMITED FIELDS TERMINATED BY ',';
> ---
> 1,\N,CD5415192314304,00071
> 2,\N,CD5415192225530,00071
> {code}
> Right table with data:
> {code}
> CREATE TABLE rtable (ra int, rk1 string, rk2 string) ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ',';
> ---
> 1,CD5415192314304,00071
> 45,CD5415192314304,00072
> {code}
> Query:
> {code}
> SELECT * FROM ltable l LEFT OUTER JOIN rtable r on (l.lk1 = r.rk1 AND l.lk2 =
> r.rk2) WHERE COALESCE(l.la,'EMPTY')=COALESCE(r.ra,'EMPTY');
> {code}
> Result returns:
> {code}
> 1 NULL CD5415192314304 00071 NULL NULL NULL
> 2 NULL CD5415192225530 00071 NULL NULL NULL
> {code}
> The correct result should be
> {code}
> 2 NULL CD5415192225530 00071 NULL NULL NULL
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)