[ 
https://issues.apache.org/jira/browse/HUDI-8881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhenqiu Huang updated HUDI-8881:
--------------------------------
    Description: 
 Let's say checkpoint A is completed, then coodinator start to hudi commit. If 
there is a write error, then the table will be rolledback and flink job will 
fail after the exception is thrown. When the job restart from failure, what 
will happen? From my understanding, the time between kafka offset in checkpoint 
A -1 and offset in checkpoint A will lose.

we should check whether WriteMetadataEvent contains write failure once the 
event is received, if there is any write failure rollback immediately and throw 
exception to prevent checkpoint to complete. When job restart it can still read 
from right kafka offset to proceed 

  was:
 Let's say checkpoint A is completed, then coodinator start to hudi commit. If 
there is a write error, then the table will be rolledback and flink job will 
fail after the exception is thrown. When the job restart from failure, what 
will happen? From my understanding, the time between kafka offset in checkpoint 
A -1 and offset in checkpoint A will lose.

we should check whether WriteMetadataEvent contains write failure once the 
event is received, if there is any write failure rollback immediately and throw 
exception to prevent checkpoint to complete.


> Potential dataloss in Flink hudi sink
> -------------------------------------
>
>                 Key: HUDI-8881
>                 URL: https://issues.apache.org/jira/browse/HUDI-8881
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: flink
>            Reporter: Zhenqiu Huang
>            Assignee: Zhenqiu Huang
>            Priority: Critical
>
>  Let's say checkpoint A is completed, then coodinator start to hudi commit. 
> If there is a write error, then the table will be rolledback and flink job 
> will fail after the exception is thrown. When the job restart from failure, 
> what will happen? From my understanding, the time between kafka offset in 
> checkpoint A -1 and offset in checkpoint A will lose.
> we should check whether WriteMetadataEvent contains write failure once the 
> event is received, if there is any write failure rollback immediately and 
> throw exception to prevent checkpoint to complete. When job restart it can 
> still read from right kafka offset to proceed 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to