[ 
https://issues.apache.org/jira/browse/FLINK-26577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17509633#comment-17509633
 ] 

Yang Wang commented on FLINK-26577:
-----------------------------------

{quote}I agree with that but we also need to make sure that the job actually 
started properly after HA was enabled. So that the configmaps with at least the 
initial state is created.
{quote}
If the checkpoint interval is configured very big(e.g. 30 minutes), I am afraid 
we could not wait until we have a completed checkpoint.
{quote}Right now the last-state upgrade mode allows you to make upgrades even 
when in a not-ready state. but maybe when switching to last-state in addition 
to having HA previously enabled the job should be in READY.
{quote}
Maybe we could always trigger a savepoint when users change upgrade mode from 
stateless/savepoint to last-state. This is a safe guard so that we will never 
lose state. Of cause, this also need the FlinkDeployment is {{{}READY{}}}.

> Avoid state loss when switching to last-state upgrade mode
> ----------------------------------------------------------
>
>                 Key: FLINK-26577
>                 URL: https://issues.apache.org/jira/browse/FLINK-26577
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Kubernetes Operator
>            Reporter: Gyula Fora
>            Priority: Major
>
> At the moment there are several corner cases which can lead to accidental 
> state loss (or at least weird behaviour) when switching to last-state upgrade 
> mode from other modes.
> 2 cases that immediately come to mind:
> savepoint to last-state: 
> When the new upgrade mode is last-state, the job deployment will simply be 
> deleted. If HA was not enabled previously, the last savepoint might be very 
> far back in time. 
> stateless to last-state:
> If checkpointing and HA is not enabled, the deployment will simply be killed 
> like previously and we might start a job from empty state. Maybe taking a 
> savepoint would be the right approach in this case and continue from there.
> Maybe when switching between modes we should consider the previous mode as 
> well as the target mode when deciding the on the suspend strategy. We could 
> also simply not allow to switch to last-state if HA is not enabled previously 
> but that might be too restrictive.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to