[ 
https://issues.apache.org/jira/browse/FLINK-39056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-39056:
-----------------------------------
    Labels: Flink-CDC iceberg pull-request-available  (was: Flink-CDC iceberg)

> FLINK-CDC Duplicate Data Issue in Iceberg Sink During Two-Phase Commit
> ----------------------------------------------------------------------
>
>                 Key: FLINK-39056
>                 URL: https://issues.apache.org/jira/browse/FLINK-39056
>             Project: Flink
>          Issue Type: Bug
>          Components: Flink CDC
>    Affects Versions: 3.0.0
>         Environment: flink-1.20
> flink-cdc-3.5
>            Reporter: ChaoFang
>            Priority: Major
>              Labels: Flink-CDC, iceberg, pull-request-available
>             Fix For: cdc-3.6.0
>
>
> h2. Summary
> This PR addresses a critical issue in the Flink CDC Iceberg sink where task 
> interruptions or restarts during the two-phase commit (2PC) process could 
> result in duplicate data being committed to Iceberg tables.
> h2. Problem
> In the previous implementation, the Iceberg sink did not track which Flink 
> checkpoints had already been successfully committed to the Iceberg snapshot. 
> If a task failed after a successful Iceberg commit but before Flink could 
> acknowledge the checkpoint completion, or if a commit was retried, the same 
> set of data could be committed again, leading to data inconsistency and 
> duplication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to