ChaoFang created FLINK-39056:
--------------------------------

             Summary: Duplicate Data Issue in Iceberg Sink During Two-Phase 
Commit
                 Key: FLINK-39056
                 URL: https://issues.apache.org/jira/browse/FLINK-39056
             Project: Flink
          Issue Type: Bug
          Components: Flink CDC
    Affects Versions: 3.0.0
         Environment: flink-1.20

flink-cdc-3.5
            Reporter: ChaoFang
             Fix For: cdc-3.6.0


h2. Summary

This PR addresses a critical issue in the Flink CDC Iceberg sink where task 
interruptions or restarts during the two-phase commit (2PC) process could 
result in duplicate data being committed to Iceberg tables.
h2. Problem

In the previous implementation, the Iceberg sink did not track which Flink 
checkpoints had already been successfully committed to the Iceberg snapshot. If 
a task failed after a successful Iceberg commit but before Flink could 
acknowledge the checkpoint completion, or if a commit was retried, the same set 
of data could be committed again, leading to data inconsistency and duplication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to