[
https://issues.apache.org/jira/browse/FLINK-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15387564#comment-15387564
]
ASF GitHub Bot commented on FLINK-4201:
---------------------------------------
GitHub user uce opened a pull request:
https://github.com/apache/flink/pull/2276
[FLINK-4201] [runtime] Forward suspend to checkpoint coordinator
Suspended jobs were leading to shutdown of the checkpoint coordinator and
hence removal of checkpoint state. For standalone recovery mode this is OK as
no state can be recovered anyways (unchanged in this PR). For HA though this
lead to removal of checkpoint state, which we
actually want to keep for recovery.
We have the following behaviour now:
JobState | Standalone | High Availability
-----------|------------|-------------------
SUSPENDED | Discard | Keep
FINISHED/FAILED/CANCELED | Discard | Discard
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/uce/flink 4201-discard_checkpoint
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/2276.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2276
----
commit 65bec557bafcfcfd53b6246f0a2154d259b0d6dd
Author: Ufuk Celebi <[email protected]>
Date: 2016-07-19T14:30:23Z
[FLINK-4201] [runtime] Forward suspend to checkpoint coordinator
Suspended jobs were leading to shutdown of the checkpoint coordinator
and hence removal of checkpoint state. For standalone recovery mode
this is OK as no state can be recovered anyways (unchanged in this PR).
For HA though this lead to removal of checkpoint state, which we
actually want to keep for recovery.
We have the following behaviour now:
-----------+------------+-------------------
| Standalone | High Availability
-----------+------------+-------------------
SUSPENDED | Discard | Keep
-----------+------------+-------------------
FINISHED/ | Discard | Discard
FAILED/ | |
CANCELED | |
-----------+------------+-------------------
----
> Checkpoints for jobs in non-terminal state (e.g. suspended) get deleted
> -----------------------------------------------------------------------
>
> Key: FLINK-4201
> URL: https://issues.apache.org/jira/browse/FLINK-4201
> Project: Flink
> Issue Type: Bug
> Components: State Backends, Checkpointing
> Reporter: Stefan Richter
> Assignee: Ufuk Celebi
> Priority: Blocker
>
> For example, when shutting down a Yarn session, according to the logs
> checkpoints for jobs that did not terminate are deleted. In the shutdown
> hook, removeAllCheckpoints is called and removes checkpoints that should
> still be kept.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)