[jira] [Commented] (FLINK-27572) Verify HA Metadata present before performing last-state restore

Yang Wang (Jira) Wed, 11 May 2022 20:35:05 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-27572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535846#comment-17535846
 ]


Yang Wang commented on FLINK-27572:
-----------------------------------

This is only necessary for Flink 1.14 and previous versions since FLINK-27495 
will cover the 1.15 and later. Right?

Not considering the ZK HA, maybe we could simply verify the existence of HA 
ConfigMaps.

Another question is how could the users fix this manually? They need to find 
out the latest external checkpoint and specify it via 
{{{}execution.savepoint.path{}}}.

> Verify HA Metadata present before performing last-state restore
> ---------------------------------------------------------------
>
>                 Key: FLINK-27572
>                 URL: https://issues.apache.org/jira/browse/FLINK-27572
>             Project: Flink
>          Issue Type: Bug
>          Components: Kubernetes Operator
>            Reporter: Gyula Fora
>            Priority: Blocker
>             Fix For: kubernetes-operator-1.0.0
>
>
> When we restore a job using the last-state logic we need to verify that the 
> HA metadata has not been deleted. And if it's not there we need to simply 
> throw an error because this requires manual user intervention.
> This only applies when the FlinkDeployment is not already in a suspended 
> state with recorded last state information.
> The problem be reproduced easily in 1.14 by triggering a fatal job error. 
> (turn of restart-strategy and kill TM for example). In these cases HA 
> metadata will be removed, and the next last-state upgrade should throw an 
> error instead of restoring from a completely empty state. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (FLINK-27572) Verify HA Metadata present before performing last-state restore

Reply via email to