[ 
https://issues.apache.org/jira/browse/FLINK-23740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kurt Young updated FLINK-23740:
-------------------------------
    Fix Version/s: 1.15.0

> SQL Full Outer Join bug
> -----------------------
>
>                 Key: FLINK-23740
>                 URL: https://issues.apache.org/jira/browse/FLINK-23740
>             Project: Flink
>          Issue Type: Bug
>          Components: Table SQL / Runtime
>    Affects Versions: 1.13.1, 1.13.2
>            Reporter: Fu Kai
>            Priority: Critical
>             Fix For: 1.15.0
>
>
> Hi team,
> We encountered an issue about FULL OUTER JOIN of Flink SQL, which happens 
> occasionally at very low probability that join output records cannot be 
> correctly updated. We cannot locate the root cause for now by glancing at the 
> SQL join logic in 
> [StreamingJoinOperator.|https://github.com/apache/flink/blob/master/flink-table/flink-table-runtime/src/main/java/org/apache/flink/table/runtime/operators/join/stream/StreamingJoinOperator.java#L198]
>  It cannot be stably reproduced and it does happen with massive data volume.
> The reason we suspect it's the FULL OUER join problem instead of others like 
> LEFT OUTER join is because the issue only arises after we introduced FULL 
> OUTER into the join flow. The query we are using is like the following. The 
> are two join code pieces below, the fist one contains solely left join(though 
> with nested) and there is no issue detected; the second one contains both 
> left and full outer join(nested as well), and the problem is that sometimes 
> update from the left table A(and other tables before the full outer join 
> operator) cannot be reflected in the final output. We suspect it could be the 
> introduce of full outer join that caused the problem, although at a very low 
> probability(~10 out of ~30million). 
> The root cause of the bug could be something else, the suspecting of FULL OUT 
> join is based on the result of our current experiment and observation.
> {code:java}
> create table A(
>     k1 int,
>     k2 int,
>     k3 int,
>     k4 int,
>     k5 int,
> PRIMARY KEY (k1, k2, k3, k4, k5) NOT ENFORCED
> ) WITH ();
> create table B(
>     k1 int,
>     k2 int,
>     k3 int,
> PRIMARY KEY (k1, k2, k3) NOT ENFORCED
> ) WITH ();
> create table C(
>     k1 int,
>     k2 int,
>     k3 int,
> PRIMARY KEY (k1, k2, k3) NOT ENFORCED
> ) WITH ();
> create table D(
>     k1 int,
>     k2 int,
> PRIMARY KEY (k1, k2) NOT ENFORCED
> ) WITH ();
> // query with left join, no issue detected
> select * from A 
> left outer join 
> (select * from B
>     left outer join C
>     on 
>         B.k1 = C.k1
>         B.k2 = C.k2
>         B.k3 = C.k3
> ) as BC
> on
>     A.k1 = BC.k1
>     A.k2 = BC.k2
>     A.k3 = BC.k3
> left outer join D
> on 
>     A.k1 = D.k1
>     A.k2 = D.k2
> ;
> // query with full outer join combined with left outer join, record updates 
> from left table A cannot be updated in the final output record some times
> select * from A 
> left outer join 
> (select * from B
>     full outer join C
>     on 
>     B.k1 = C.k1
>     B.k2 = C.k2
>     B.k3 = C.k3
> ) as BC
> on
> A.k1 = BC.k1
> A.k2 = BC.k2
> A.k3 = BC.k3
> left outer join D
> on 
> A.k1 = D.k1
> A.k2 = D.k2
> ;
> {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to