[
https://issues.apache.org/jira/browse/FLINK-19795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236224#comment-17236224
]
Jark Wu commented on FLINK-19795:
---------------------------------
I would propose to add a job option
{{table.exec.source.cdc-events-duplicate=true|false}} to indicate whether the
cdc source produce duplicate messages that require the framework to
deduplicate. By default, the value is false (for backward-compatibility). When
it is set to true, it requires to set a primary key on the source, and the
framework will this the primary key to deduplicate/normalize the changelog
(using the ChangelogNormalize operator).
This option only take effect on the cdc sources which will produce full
changelog, including INSERT/UPDATE_BEFORE/UPDATE_AFTER/DELETE events. So it
doesn't affect the upsert sources, because upsert sources always generate a
ChangelogNormalize operator after it.
What do you think about this? [~Leonard Xu] [~godfreyhe]
> Fix Flink SQL throws exception when changelog source contains duplicate
> change events
> -------------------------------------------------------------------------------------
>
> Key: FLINK-19795
> URL: https://issues.apache.org/jira/browse/FLINK-19795
> Project: Flink
> Issue Type: Sub-task
> Components: Table SQL / Runtime
> Affects Versions: 1.11.2
> Reporter: jinxin
> Assignee: Jark Wu
> Priority: Major
> Fix For: 1.12.0
>
>
> We are using Canal to synchornize MySQL data into Kafka, the synchornization
> delivery is not exactly-once, so there might be dupcliate
> INSERT/UPDATE/DELETE messages for the same primary key. We are using
> {{'connecotr' = 'kafka', 'format' = 'canal-json'}} to consume such topic.
> However, when appling TopN query on this created source table, the TopN
> operator will thrown exception: {{Caused by: java.lang.RuntimeException: Can
> not retract a non-existent record. This should never happen.}}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)