[
https://issues.apache.org/jira/browse/FLINK-23553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Till Rohrmann updated FLINK-23553:
----------------------------------
Fix Version/s: (was: 1.14.0)
> Trigger global failover for synchronous savepoints on CheckpointCoordinator
> ---------------------------------------------------------------------------
>
> Key: FLINK-23553
> URL: https://issues.apache.org/jira/browse/FLINK-23553
> Project: Flink
> Issue Type: Sub-task
> Components: Runtime / Checkpointing
> Affects Versions: 1.11.3, 1.13.1, 1.12.4
> Reporter: Dawid Wysakowicz
> Priority: Major
>
> We should trigger a global job failover in case of a {{stop-with-savepoint
> --drain}} fails.
> The situation is obvious in case of the with drain mode. If a savepoint fails
> we simply can not continue as we have already flushed all data and prepared
> the state for finishing. We can not simply continue processing records.
> It is more debatable for without drain mode, where we could theoretically
> continue processing records, however, it is also a good approach to unify the
> two modes.
> This task is about triggering the failover on the CheckpointCoordinator. We
> should make sure that if a synchronous checkpoint has been triggered there
> will be no newere checkpoints scheduled.
> If a synchronous savepoint fails for whatever reason we should trigger a
> global failover for the job.
> We might add a safety guards (checkState calls for situations we missed on
> the Task in a follow-up ticket)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)