[jira] [Updated] (FLINK-28604) job failover and not restore from checkpoint in zookeeper HA mode

KevinyhZou (Jira) Mon, 18 Jul 2022 23:33:06 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-28604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


KevinyhZou updated FLINK-28604:
-------------------------------
    Description: 
Run a job with flink 1.14.2 by configure the zookeeper ha 
{code:java}
high-availability.storageDir: hdfs://testcluster/app/ha
high-availability: zookeeper
high-availability.zookeeper.quorum: *****
high-availability.zookeeper.path.root: /flink{code}
when the zookeeper node restart, I see the JM failover with log "Close and 
clean up all data for  ZookeeperHaServices",  So the ha data was cleaned when 
the first JM shutdown. 

when the second JM was started,  the log was "No checkpoint found during 
restore", and no checkpoint to restored  .

>From debug, I find when job failover, it would goto the 
>`ClusterEntryPoint.java` line 285

!image-2022-07-19-14-30-27-198.png!

and will set the `cleanupHaData` as true.

 

  was:
Run a job with flink 1.14.2 by configure the zookeeper ha 
{code:java}
high-availability.storageDir: hdfs://testcluster/app/flink/ha
high-availability: zookeeper
high-availability.zookeeper.quorum: *****
high-availability.zookeeper.path.root: /flink{code}
when the zookeeper node restart, I see the JM failover with log "Close and 
clean up all data for  ZookeeperHaServices",  So the ha data was cleaned when 
the first JM shutdown. 

when the second JM was started,  the log was "No checkpoint found during 
restore", and no checkpoint to restored  .

>From debug, I find when job failover, it would goto the 
>`ClusterEntryPoint.java` line 285

!image-2022-07-19-14-30-27-198.png!

and will set the `cleanupHaData` as true.

 


> job failover and not restore from checkpoint in zookeeper HA mode
> -----------------------------------------------------------------
>
>                 Key: FLINK-28604
>                 URL: https://issues.apache.org/jira/browse/FLINK-28604
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.14.2
>            Reporter: KevinyhZou
>            Priority: Major
>         Attachments: image-2022-07-19-14-30-27-198.png
>
>
> Run a job with flink 1.14.2 by configure the zookeeper ha 
> {code:java}
> high-availability.storageDir: hdfs://testcluster/app/ha
> high-availability: zookeeper
> high-availability.zookeeper.quorum: *****
> high-availability.zookeeper.path.root: /flink{code}
> when the zookeeper node restart, I see the JM failover with log "Close and 
> clean up all data for  ZookeeperHaServices",  So the ha data was cleaned when 
> the first JM shutdown. 
> when the second JM was started,  the log was "No checkpoint found during 
> restore", and no checkpoint to restored  .
> From debug, I find when job failover, it would goto the 
> `ClusterEntryPoint.java` line 285
> !image-2022-07-19-14-30-27-198.png!
> and will set the `cleanupHaData` as true.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-28604) job failover and not restore from checkpoint in zookeeper HA mode

Reply via email to