[
https://issues.apache.org/jira/browse/FLINK-39056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ChaoFang updated FLINK-39056:
-----------------------------
Summary: FLINK-CDC Duplicate Data Issue in Iceberg Sink During Two-Phase
Commit (was: Duplicate Data Issue in Iceberg Sink During Two-Phase Commit)
> FLINK-CDC Duplicate Data Issue in Iceberg Sink During Two-Phase Commit
> ----------------------------------------------------------------------
>
> Key: FLINK-39056
> URL: https://issues.apache.org/jira/browse/FLINK-39056
> Project: Flink
> Issue Type: Bug
> Components: Flink CDC
> Affects Versions: 3.0.0
> Environment: flink-1.20
> flink-cdc-3.5
> Reporter: ChaoFang
> Priority: Major
> Labels: Flink-CDC, iceberg
> Fix For: cdc-3.6.0
>
>
> h2. Summary
> This PR addresses a critical issue in the Flink CDC Iceberg sink where task
> interruptions or restarts during the two-phase commit (2PC) process could
> result in duplicate data being committed to Iceberg tables.
> h2. Problem
> In the previous implementation, the Iceberg sink did not track which Flink
> checkpoints had already been successfully committed to the Iceberg snapshot.
> If a task failed after a successful Iceberg commit but before Flink could
> acknowledge the checkpoint completion, or if a commit was retried, the same
> set of data could be committed again, leading to data inconsistency and
> duplication.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)