[jira] [Commented] (FLINK-4512) Add option for persistent checkpoints

ASF GitHub Bot (JIRA) Fri, 07 Oct 2016 08:54:55 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-4512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15555450#comment-15555450
 ]


ASF GitHub Bot commented on FLINK-4512:
---------------------------------------

GitHub user uce opened a pull request:

    https://github.com/apache/flink/pull/2608

    [FLINK-4512] [FLIP-10] Add option to persist periodic checkpoints

    ## Introduction
    
    This is the first part of 
[FLIP-10](https://cwiki.apache.org/confluence/display/FLINK/FLIP-10%3A+Unify+Checkpoints+and+Savepoints),
 allowing users to persist periodic checkpoints.
    
    Persistent checkpoints behave very much like regular periodic checkpoints 
except the following differences:
    
    1. They persist their meta data (like savepoints).
    2. They are not discarded when the owning job fails permanently. 
Furthermore, they can be configured to not be discarded when the job is 
cancelled.
    
    This means that if a job fails permanently the user will have a checkpoint 
available to restore from. As an example think of the following scenario: a job 
runs smoothly until it hits a bad record that it cannot handle. The current 
behaviour will be that the job will try to recover, but it will hit the bad 
record again and keep on failing. With persistent checkpoints, the user can 
update the program to handle bad records and restore from the most recent 
persistent checkpoints.
    
    ## CheckpointConfig
    
    This adds the following `@PublicEvolving` methods to `CheckpointConfig`:
    
    ```
    enablePersistentCheckpoints(String targetDirectory);
    enablePersistentCheckpoints(String targetDirectory, 
PersistentCheckpointCleanup cleanup)
    ```
    
    The `PersistentCheckpointCleanup` defines how persistent checkpoints are 
cleaned up when the owning job is cancelled. Since currently most streaming 
jobs are stopped via cancellation, the default is to clean persistent 
checkpoints up. The user can overwrite this behaviour via the enum.
    
    ## REST API
    
    The REST API exposes the external-path of the most recent persistent 
checkpoint via the REST API. This is also displayed in the web UI for the most 
recent persistent checkpoint.
    
    ![screen shot 2016-10-07 at 17 50 
44](https://cloud.githubusercontent.com/assets/1756620/19196699/d0d5065a-8cb6-11e6-8b13-c6bacc4ebe19.png)
    
    ## Deprecate savepoint state backends (FLINK-4507)
    
    Furthermore, the savepoint state backends have been removed and all 
savepoints now go to files. The corresponding configuration keys have been 
removed or deprecated:
    
    `savepoints.state.backend.fs.dir` has been deprecated in favour of 
`state.savepoints.dir`. `savepoints.state.backend` has been removed.
    
    ## Allow to specify custom savepoint directory (FLINK-4509)
    
    The target directory for savepoints was configured per Flink configuration. 
With this change, this can be overwritten:
    
    ```
    bin/flink savepoint <jobId> [targetDirectory]
    ```


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/uce/flink 4512-persistent_checkpoints

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/2608.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2608
    
----
commit 004ba0b38ac2b75148910660242808c13746c444
Author: Ufuk Celebi <[email protected]>
Date:   2016-10-06T14:43:42Z

    [FLINK-4512] [FLIP-10] Add option to persist periodic checkpoints
    
    [FLINK-4509] [FLIP-10] Specify savepoint directory per savepoint
    [FLINK-4507] [FLIP-10] Deprecate savepoint backend config

----


> Add option for persistent checkpoints
> -------------------------------------
>
>                 Key: FLINK-4512
>                 URL: https://issues.apache.org/jira/browse/FLINK-4512
>             Project: Flink
>          Issue Type: Sub-task
>          Components: State Backends, Checkpointing
>            Reporter: Ufuk Celebi
>            Assignee: Ufuk Celebi
>
> Allow periodic checkpoints to be persisted by writing out their meta data. 
> This is what we currently do for savepoints, but in the future checkpoints 
> and savepoints are likely to diverge with respect to guarantees they give for 
> updatability, etc.
> This means that the difference between persistent checkpoints and savepoints 
> in the long term will be that persistent checkpoints can only be restored 
> with the same job settings (like parallelism, etc.)
> Regular and persisted checkpoints should behave differently with respect to 
> disposal in *globally* terminal job states (FINISHED, CANCELLED, FAILED): 
> regular checkpoints are cleaned up in all of these cases whereas persistent 
> checkpoints only on FINISHED. Maybe with the option to customize behaviour on 
> CANCELLED or FAILED.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-4512) Add option for persistent checkpoints

Reply via email to