[jira] [Closed] (FLINK-28604) job failover and not restore from checkpoint in zookeeper HA mode

KevinyhZou (Jira) Tue, 19 Jul 2022 03:21:08 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-28604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


KevinyhZou closed FLINK-28604.
------------------------------
    Fix Version/s: 1.14.5
       Resolution: Fixed

> job failover and not restore from checkpoint in zookeeper HA mode
> -----------------------------------------------------------------
>
>                 Key: FLINK-28604
>                 URL: https://issues.apache.org/jira/browse/FLINK-28604
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.14.2
>            Reporter: KevinyhZou
>            Priority: Major
>             Fix For: 1.14.5
>
>         Attachments: image-2022-07-19-14-30-27-198.png
>
>
> Run a job with flink 1.14.2 by configure the zookeeper ha 
> {code:java}
> high-availability.storageDir: hdfs://testcluster/app/ha
> high-availability: zookeeper
> high-availability.zookeeper.quorum: *****
> high-availability.zookeeper.path.root: /flink{code}
> when the zookeeper node restart, I see the JM failover with log "Close and 
> clean up all data for  ZookeeperHaServices",  So the ha data was cleaned when 
> the first JM shutdown. 
> when the second JM was started,  the log was "No checkpoint found during 
> restore", and no checkpoint to restored  .
> From debug, I find when job failover, it would goto the 
> `ClusterEntryPoint.java` line 285
> !image-2022-07-19-14-30-27-198.png!
> and will set the `cleanupHaData` as true.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-28604) job failover and not restore from checkpoint in zookeeper HA mode

Reply via email to