[ 
https://issues.apache.org/jira/browse/FLINK-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15387564#comment-15387564
 ] 

ASF GitHub Bot commented on FLINK-4201:
---------------------------------------

GitHub user uce opened a pull request:

    https://github.com/apache/flink/pull/2276

    [FLINK-4201] [runtime] Forward suspend to checkpoint coordinator

    Suspended jobs were leading to shutdown of the checkpoint coordinator and 
hence removal of checkpoint state. For standalone recovery mode this is OK as 
no state can be recovered anyways (unchanged in this PR). For HA though this 
lead to removal of checkpoint state, which we
    actually want to keep for recovery.
    
    We have the following behaviour now:
    
    
    JobState | Standalone | High Availability
    -----------|------------|-------------------
     SUSPENDED |  Discard   |       Keep
     FINISHED/FAILED/CANCELED |  Discard   |     Discard

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/uce/flink 4201-discard_checkpoint

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/2276.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2276
    
----
commit 65bec557bafcfcfd53b6246f0a2154d259b0d6dd
Author: Ufuk Celebi <[email protected]>
Date:   2016-07-19T14:30:23Z

    [FLINK-4201] [runtime] Forward suspend to checkpoint coordinator
    
    Suspended jobs were leading to shutdown of the checkpoint coordinator
    and hence removal of checkpoint state. For standalone recovery mode
    this is OK as no state can be recovered anyways (unchanged in this PR).
    For HA though this lead to removal of checkpoint state, which we
    actually want to keep for recovery.
    
    We have the following behaviour now:
    
    -----------+------------+-------------------
               | Standalone | High Availability
    -----------+------------+-------------------
     SUSPENDED |  Discard   |       Keep
    -----------+------------+-------------------
     FINISHED/ |  Discard   |     Discard
     FAILED/   |            |
     CANCELED  |            |
    -----------+------------+-------------------

----


> Checkpoints for jobs in non-terminal state (e.g. suspended) get deleted
> -----------------------------------------------------------------------
>
>                 Key: FLINK-4201
>                 URL: https://issues.apache.org/jira/browse/FLINK-4201
>             Project: Flink
>          Issue Type: Bug
>          Components: State Backends, Checkpointing
>            Reporter: Stefan Richter
>            Assignee: Ufuk Celebi
>            Priority: Blocker
>
> For example, when shutting down a Yarn session, according to the logs 
> checkpoints for jobs that did not terminate are deleted. In the shutdown 
> hook, removeAllCheckpoints is called and removes checkpoints that should 
> still be kept.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to