[
https://issues.apache.org/jira/browse/FLINK-26577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17511659#comment-17511659
]
Yang Wang commented on FLINK-26577:
-----------------------------------
After more consideration, I am concerning the necessity of allowing upgrade
mode change during the running state. It will make the reconciliation more
complicated and have more potential bugs in the future. As a user, I think it
makes sense to me about suspending the job first and then changing the upgrade
mode is allowed.
cc [~gyfora]
> Avoid state loss when switching to last-state upgrade mode
> ----------------------------------------------------------
>
> Key: FLINK-26577
> URL: https://issues.apache.org/jira/browse/FLINK-26577
> Project: Flink
> Issue Type: Sub-task
> Components: Kubernetes Operator
> Reporter: Gyula Fora
> Assignee: Yang Wang
> Priority: Major
>
> At the moment there are several corner cases which can lead to accidental
> state loss (or at least weird behaviour) when switching to last-state upgrade
> mode from other modes.
> 2 cases that immediately come to mind:
> savepoint to last-state:
> When the new upgrade mode is last-state, the job deployment will simply be
> deleted. If HA was not enabled previously, the last savepoint might be very
> far back in time.
> stateless to last-state:
> If checkpointing and HA is not enabled, the deployment will simply be killed
> like previously and we might start a job from empty state. Maybe taking a
> savepoint would be the right approach in this case and continue from there.
> Maybe when switching between modes we should consider the previous mode as
> well as the target mode when deciding the on the suspend strategy. We could
> also simply not allow to switch to last-state if HA is not enabled previously
> but that might be too restrictive.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)