[jira] [Updated] (FLINK-23553) Trigger global failover for synchronous savepoints on CheckpointCoordinator

Till Rohrmann (Jira) Wed, 11 Aug 2021 02:48:04 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-23553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Till Rohrmann updated FLINK-23553:
----------------------------------
    Fix Version/s:     (was: 1.14.0)

> Trigger global failover for synchronous savepoints on CheckpointCoordinator
> ---------------------------------------------------------------------------
>
>                 Key: FLINK-23553
>                 URL: https://issues.apache.org/jira/browse/FLINK-23553
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.11.3, 1.13.1, 1.12.4
>            Reporter: Dawid Wysakowicz
>            Priority: Major
>
> We should trigger a global job failover in case of a {{stop-with-savepoint 
> --drain}} fails.
> The situation is obvious in case of the with drain mode. If a savepoint fails 
> we simply can not continue as we have already flushed all data and prepared 
> the state for finishing. We can not simply continue processing records.
> It is more debatable for without drain mode, where we could theoretically 
> continue processing records, however, it is also a good approach to unify the 
> two modes.
> This task is about triggering the failover on the CheckpointCoordinator. We 
> should make sure that if a synchronous checkpoint has been triggered there 
> will be no newere checkpoints scheduled. 
> If a synchronous savepoint fails for whatever reason we should trigger a 
> global failover for the job.
> We might add a safety guards  (checkState calls for situations we missed on 
> the Task in a follow-up ticket)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-23553) Trigger global failover for synchronous savepoints on CheckpointCoordinator

Reply via email to