[jira] [Commented] (FLINK-28604) job failover and not restore from checkpoint in zookeeper HA mode

Yun Tang (Jira) Sun, 24 Jul 2022 19:25:07 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-28604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17570577#comment-17570577
 ]


Yun Tang commented on FLINK-28604:
----------------------------------

BTW, [~zouyunhe] could this problem be easily reproduced in flink-1.14.2, and 
it will disappear once we bump to flink-1.14.5?

> job failover and not restore from checkpoint in zookeeper HA mode
> -----------------------------------------------------------------
>
>                 Key: FLINK-28604
>                 URL: https://issues.apache.org/jira/browse/FLINK-28604
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.14.2
>            Reporter: KevinyhZou
>            Priority: Major
>         Attachments: image-2022-07-19-14-30-27-198.png
>
>
> Run a job with flink 1.14.2 by configure the zookeeper ha 
> {code:java}
> high-availability.storageDir: hdfs://testcluster/app/ha
> high-availability: zookeeper
> high-availability.zookeeper.quorum: *****
> high-availability.zookeeper.path.root: /flink{code}
> when the zookeeper node restart, I see the JM failover with log "Close and 
> clean up all data for  ZookeeperHaServices",  So the ha data was cleaned when 
> the first JM shutdown. 
> when the second JM was started,  the log was "No checkpoint found during 
> restore", and no checkpoint to restored  .
> From debug, I find when job failover, it would goto the 
> `ClusterEntryPoint.java` line 285
> !image-2022-07-19-14-30-27-198.png!
> and will set the `cleanupHaData` as true.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-28604) job failover and not restore from checkpoint in zookeeper HA mode

Reply via email to