[ 
https://issues.apache.org/jira/browse/FLINK-19795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17220462#comment-17220462
 ] 

Jark Wu commented on FLINK-19795:
---------------------------------

Regarding to this issue, I think this is very common, because usually the 
message delivery between CDC tool and message queue is at-least-once semantic. 
Therefore, the change event may duplicate. We have to drop such duplicate 
events before processing by the query operators (aggregate, join, topn, 
etc...). 

> Flink SQL throws exception when changelog source contains duplicate change 
> events
> ---------------------------------------------------------------------------------
>
>                 Key: FLINK-19795
>                 URL: https://issues.apache.org/jira/browse/FLINK-19795
>             Project: Flink
>          Issue Type: Bug
>          Components: Table SQL / Runtime
>    Affects Versions: 1.11.2
>            Reporter: jinxin
>            Priority: Major
>
> We are using Canal to synchornize MySQL data into Kafka, the synchornization 
> delivery is not exactly-once, so there might be dupcliate 
> INSERT/UPDATE/DELETE messages for the same primary key. We are using 
> {{'connecotr' = 'kafka', 'format' = 'canal-json'}} to consume such topic. 
> However, when appling TopN query on this created source table, the TopN 
> operator will thrown exception: {{Caused by: java.lang.RuntimeException: Can 
> not retract a non-existent record. This should never happen.}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to