[ 
https://issues.apache.org/jira/browse/FLINK-35857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17866645#comment-17866645
 ] 

chenyuzhi commented on FLINK-35857:
-----------------------------------

Hi, [~gyfora] 

Could you help to have a glimse at this operator problem. And I would like to 
help fix this problem if need

> Operator restart failed job without latest checkpoint
> -----------------------------------------------------
>
>                 Key: FLINK-35857
>                 URL: https://issues.apache.org/jira/browse/FLINK-35857
>             Project: Flink
>          Issue Type: Bug
>          Components: Kubernetes Operator
>    Affects Versions: kubernetes-operator-1.6.1
>         Environment:  flink kubernetes operator version: 1.6.1
> flink version 1.15.2
> flink job config:
> *execution.shutdown-on-application-finish=false*
>            Reporter: chenyuzhi
>            Priority: Major
>         Attachments: image-2024-07-17-15-03-29-618.png, 
> image-2024-07-17-15-04-32-913.png
>
>
> Using flink kubernetes operator, with config: 
> {code:java}
> kubernetes.operator.job.restart.failed=true {code}
> We got different failed-job restart result in two case. 
> Case1:  
>  A job with period checkpoint enable and an intial checkpoint path, when it 
> failed, the operator will auto redeploy the deployment with the same job_id 
> and latest checkpoint path 
>  
> !image-2024-07-17-15-03-29-618.png|width=763,height=301!
>  
> Case2:
>  A job with period checkpoint enable but  no intial checkpoint, when it 
> failed, the operator will auto redeploy the deployment with different job_id  
> and no intial checkpoint path.
> !image-2024-07-17-15-04-32-913.png|width=759,height=287!
>  
> In the case2, the redeploy behaviour may case data inconsitence. For example 
> the kafka source connector may consume data from earliest/latest offset.
>  
> Thus i think  a job with period checkpoint enable but  no intial checkpoint, 
> should be restart with the same job_id and latest checkpoint path, just like 
> case1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to