[
https://issues.apache.org/jira/browse/FLINK-19795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17220462#comment-17220462
]
Jark Wu commented on FLINK-19795:
---------------------------------
Regarding to this issue, I think this is very common, because usually the
message delivery between CDC tool and message queue is at-least-once semantic.
Therefore, the change event may duplicate. We have to drop such duplicate
events before processing by the query operators (aggregate, join, topn,
etc...).
> Flink SQL throws exception when changelog source contains duplicate change
> events
> ---------------------------------------------------------------------------------
>
> Key: FLINK-19795
> URL: https://issues.apache.org/jira/browse/FLINK-19795
> Project: Flink
> Issue Type: Bug
> Components: Table SQL / Runtime
> Affects Versions: 1.11.2
> Reporter: jinxin
> Priority: Major
>
> We are using Canal to synchornize MySQL data into Kafka, the synchornization
> delivery is not exactly-once, so there might be dupcliate
> INSERT/UPDATE/DELETE messages for the same primary key. We are using
> {{'connecotr' = 'kafka', 'format' = 'canal-json'}} to consume such topic.
> However, when appling TopN query on this created source table, the TopN
> operator will thrown exception: {{Caused by: java.lang.RuntimeException: Can
> not retract a non-existent record. This should never happen.}}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)