[
https://issues.apache.org/jira/browse/FLINK-23740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17405605#comment-17405605
]
Timo Walther commented on FLINK-23740:
--------------------------------------
[~zhihao] is it really necessary to use queryable state for fixing this issue?
FLINK-23989 is definitely a nice idea that is worth to be discussed but this
issue could also be investigated by taking a savepoint and inspecting a
consistent snapshot of the whole application, no?
> SQL Full Outer Join bug
> -----------------------
>
> Key: FLINK-23740
> URL: https://issues.apache.org/jira/browse/FLINK-23740
> Project: Flink
> Issue Type: Bug
> Components: Table SQL / Runtime
> Affects Versions: 1.13.1, 1.13.2
> Reporter: Fu Kai
> Priority: Critical
>
> Hi team,
> We encountered an issue about FULL OUTER JOIN of Flink SQL, which happens
> occasionally at very low probability that join output records cannot be
> correctly updated. We cannot locate the root cause for now by glancing at the
> SQL join logic in
> [StreamingJoinOperator.|https://github.com/apache/flink/blob/master/flink-table/flink-table-runtime/src/main/java/org/apache/flink/table/runtime/operators/join/stream/StreamingJoinOperator.java#L198]
> It cannot be stably reproduced and it does happen with massive data volume.
> The reason we suspect it's the FULL OUER join problem instead of others like
> LEFT OUTER join is because the issue only arises after we introduced FULL
> OUTER into the join flow. The query we are using is like the following. The
> are two join code pieces below, the fist one contains solely left join(though
> with nested) and there is no issue detected; the second one contains both
> left and full outer join(nested as well), and the problem is that sometimes
> update from the left table A(and other tables before the full outer join
> operator) cannot be reflected in the final output. We suspect it could be the
> introduce of full outer join that caused the problem, although at a very low
> probability(~10 out of ~30million).
> The root cause of the bug could be something else, the suspecting of FULL OUT
> join is based on the result of our current experiment and observation.
> {code:java}
> create table A(
> k1 int,
> k2 int,
> k3 int,
> k4 int,
> k5 int,
> PRIMARY KEY (k1, k2, k3, k4, k5) NOT ENFORCED
> ) WITH ();
> create table B(
> k1 int,
> k2 int,
> k3 int,
> PRIMARY KEY (k1, k2, k3) NOT ENFORCED
> ) WITH ();
> create table C(
> k1 int,
> k2 int,
> k3 int,
> PRIMARY KEY (k1, k2, k3) NOT ENFORCED
> ) WITH ();
> create table D(
> k1 int,
> k2 int,
> PRIMARY KEY (k1, k2) NOT ENFORCED
> ) WITH ();
> // query with left join, no issue detected
> select * from A
> left outer join
> (select * from B
> left outer join C
> on
> B.k1 = C.k1
> B.k2 = C.k2
> B.k3 = C.k3
> ) as BC
> on
> A.k1 = BC.k1
> A.k2 = BC.k2
> A.k3 = BC.k3
> left outer join D
> on
> A.k1 = D.k1
> A.k2 = D.k2
> ;
> // query with full outer join combined with left outer join, record updates
> from left table A cannot be updated in the final output record some times
> select * from A
> left outer join
> (select * from B
> full outer join C
> on
> B.k1 = C.k1
> B.k2 = C.k2
> B.k3 = C.k3
> ) as BC
> on
> A.k1 = BC.k1
> A.k2 = BC.k2
> A.k3 = BC.k3
> left outer join D
> on
> A.k1 = D.k1
> A.k2 = D.k2
> ;
> {code}
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)