[
https://issues.apache.org/jira/browse/FLINK-27153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528106#comment-17528106
]
Gyula Fora commented on FLINK-27153:
------------------------------------
Once concern here that I was thinking about is that the savepoint upgrade mode
is a bit more fragile than last-state.
There can be a scenario where the operator fails/restarted after shutdown was
triggered but before the savepoint was recorded in the status.
In these cases the job can only be recovered using manual intervention (finding
the last savepoint and recreating the deployment)
Based on this it might not be a good idea to expose this setting for the
last-state upgrade mode because that weakens upgradeMode logic.
So I would opt for exposing this only as
operator.job.upgrade.savepoint.last-state-fallback
this would only take effect if the upgradeMode is set to savepoint, and the job
is failing for some reason.
> Allow optional last-state fallback for savepoint upgrade mode
> -------------------------------------------------------------
>
> Key: FLINK-27153
> URL: https://issues.apache.org/jira/browse/FLINK-27153
> Project: Flink
> Issue Type: Improvement
> Components: Kubernetes Operator
> Reporter: Gyula Fora
> Priority: Major
> Fix For: kubernetes-operator-1.0.0
>
>
> In many cases users would prefer to take a savepoint if the job is healthy
> before performing an upgrade but still allow checkpoint based (last-state)
> recovery in case the savepoint fails or the job is generally in a bad state.
> We should add a configuration flag for this that the user can set in the
> flinkConfiguration:
> `kubernetes.operator.job.upgrade.last-state-fallback`
--
This message was sent by Atlassian Jira
(v8.20.7#820007)