ChaoFang created FLINK-39056:
--------------------------------
Summary: Duplicate Data Issue in Iceberg Sink During Two-Phase
Commit
Key: FLINK-39056
URL: https://issues.apache.org/jira/browse/FLINK-39056
Project: Flink
Issue Type: Bug
Components: Flink CDC
Affects Versions: 3.0.0
Environment: flink-1.20
flink-cdc-3.5
Reporter: ChaoFang
Fix For: cdc-3.6.0
h2. Summary
This PR addresses a critical issue in the Flink CDC Iceberg sink where task
interruptions or restarts during the two-phase commit (2PC) process could
result in duplicate data being committed to Iceberg tables.
h2. Problem
In the previous implementation, the Iceberg sink did not track which Flink
checkpoints had already been successfully committed to the Iceberg snapshot. If
a task failed after a successful Iceberg commit but before Flink could
acknowledge the checkpoint completion, or if a commit was retried, the same set
of data could be committed again, leading to data inconsistency and duplication.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)