[
https://issues.apache.org/jira/browse/FLINK-28604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
KevinyhZou closed FLINK-28604.
------------------------------
Fix Version/s: 1.14.5
Resolution: Fixed
> job failover and not restore from checkpoint in zookeeper HA mode
> -----------------------------------------------------------------
>
> Key: FLINK-28604
> URL: https://issues.apache.org/jira/browse/FLINK-28604
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Checkpointing
> Affects Versions: 1.14.2
> Reporter: KevinyhZou
> Priority: Major
> Fix For: 1.14.5
>
> Attachments: image-2022-07-19-14-30-27-198.png
>
>
> Run a job with flink 1.14.2 by configure the zookeeper ha
> {code:java}
> high-availability.storageDir: hdfs://testcluster/app/ha
> high-availability: zookeeper
> high-availability.zookeeper.quorum: *****
> high-availability.zookeeper.path.root: /flink{code}
> when the zookeeper node restart, I see the JM failover with log "Close and
> clean up all data for ZookeeperHaServices", So the ha data was cleaned when
> the first JM shutdown.
> when the second JM was started, the log was "No checkpoint found during
> restore", and no checkpoint to restored .
> From debug, I find when job failover, it would goto the
> `ClusterEntryPoint.java` line 285
> !image-2022-07-19-14-30-27-198.png!
> and will set the `cleanupHaData` as true.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)