[ 
https://issues.apache.org/jira/browse/IMPALA-5022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200435#comment-17200435
 ] 

ASF subversion and git services commented on IMPALA-5022:
---------------------------------------------------------

Commit dcdbaf12224e7029dde3110d56edea022dd3a2a0 in impala's branch 
refs/heads/master from xqhe
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=dcdbaf1 ]

IMPALA-5022 part 1: Implement core functions of outer join simplification

Outer joins in SQL can return rows with certain columns filled with
NULLs when a match can not be found. However, such rows can be
rejected by null-rejecting predicates. The conditions in a null-rejecting
predicate that are always evaluated to FALSE for NULLs are referred to
as null-filtering conditions.

In general, an outer join can be converted to an inner join if there
exist null-filtering conditions on the inner tables. In a left outer
join, the right table is the inner table, while in a right outer join
it is the left table. In a full outer join, both tables are inner tables.

The option ENABLE_OUTER_JOIN_TO_INNER_TRANSFORMATION enables or disables
the entire rewrite. This is False by default until we have done more
thorough functional testing.

For example,
1. A LEFT JOIN B ON A.id = B.id WHERE B.v > 10
= A INNER JOIN B ON A.id = B.id WHERE B.v > 10

2. A RIGHT JOIN B ON A.id = B.id WHERE A.v > 10
= A INNER JOIN B ON A.id = B.id WHERE A.v > 10

3. A FULL JOIN B ON A.id = B.id WHERE A.v > 10
= A LEFT JOIN B ON A.id = B.id WHERE A.v > 10

4. A FULL JOIN B ON A.id = B.id WHERE B.v > 10
= A RIGHT JOIN B ON A.id = B.id WHERE B.v > 10

5. A FULL JOIN B ON A.id = B.id WHERE A.v > 10 AND B.v > 10
= A INNER JOIN B ON A.id = B.id WHERE A.v > 10 AND B.v > 10

6. A LEFT JOIN B ON A.id = B.id INNER JOIN C ON B.id = C.id
= A INNER JOIN B ON A.id = B.id INNER JOIN C ON B.id = C.id

7. A RIGHT JOIN B ON A.id = B.id INNER JOIN C ON A.id = C.id
= A INNER JOIN B ON A.id = B.id INNER JOIN C ON A.id = C.id

8. A FULL JOIN B ON A.id = B.id INNER JOIN C ON A.id = C.id
= A LEFT JOIN B ON A.id = B.id INNER JOIN C ON A.id = C.id

9. A FULL JOIN B ON A.id = B.id INNER JOIN C ON B.id = C.id
= A RIGHT JOIN B ON A.id = B.id INNER JOIN C ON B.id = C.id

10. A FULL JOIN B ON A.id = B.id INNER JOIN C ON A.id + B.id = C.id
= A INNER JOIN B ON A.id = B.id INNER JOIN C ON A.id + B.id = C.id

In this commit, we have supported most of the cases that can convert
an outer join to an inner join, except for converting the embedding
inline view outer join by the join condition like
"SELECT * FROM T1 JOIN (SELECT T3.A A FROM T2 LEFT JOIN T3
ON T3.B=T2.B) T4 ON T4.A=T1.A". We will support it in part 2.

Tests:
* Add new plan tests outer-to-inner-joins.test
* Add new query tests to verify the correctness on transformation
* Ran the full set of verifications in Impala Public Jenkins

Change-Id: Iaa7804033fac68e93f33c387dc68ef67f803e93e
Reviewed-on: http://gerrit.cloudera.org:8080/16266
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Outer join simplification
> -------------------------
>
>                 Key: IMPALA-5022
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5022
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>    Affects Versions: Impala 2.9.0
>            Reporter: Greg Rahn
>            Assignee: Xianqing He
>            Priority: Major
>              Labels: planner, tpc-ds
>
> As a general rule, an outer join can be converted to an inner join if there 
> is a condition on the inner table that filters out non‑matching rows. In a 
> left outer join, the right table is the inner table, while it is the left 
> table in a right outer join. In a full outer join, both tables are inner 
> tables. Conditions that are FALSE for nulls are referred to as null filtering 
> conditions, and these are the conditions that enable the outer‑to‑inner join 
> conversion to be made.
> An outer join can be converted to an inner join if at least one of the 
> following conditions is true.
> * The WHERE clause contains at least one null filtering condition on the 
> inner table.
> * The outer join is involved in another join, and the other join condition 
> has one or more null filtering conditions on the inner table. The other join 
> in this case can be an inner join, left outer join, or right outer join. It 
> cannot be a full outer join because there is no inner table in this case.
> A null filtering condition on the right side of a full outer join converts it 
> to a left outer join, while a null filtering condition on the left side 
> converts it to a right outer join.
> For example the following query
> {noformat}
> select t1.c1, t2.c1
> from t1 left outer join t2 using (x)
> where t2.c2 > 5
> {noformat}
> can safely be converted to
> {noformat}
> select t1.c1, t2.c1
> from t1 join t2 using (x)
> where t2.c2 > 5
> {noformat}
> because the predicate {{t2.c2 > 5}} is interpreted as FALSE if {{t2.c2}} is 
> NULL and therefore the condition removes all non‑matching rows of the outer 
> join.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to