[
https://issues.apache.org/jira/browse/FLINK-4512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15555450#comment-15555450
]
ASF GitHub Bot commented on FLINK-4512:
---------------------------------------
GitHub user uce opened a pull request:
https://github.com/apache/flink/pull/2608
[FLINK-4512] [FLIP-10] Add option to persist periodic checkpoints
## Introduction
This is the first part of
[FLIP-10](https://cwiki.apache.org/confluence/display/FLINK/FLIP-10%3A+Unify+Checkpoints+and+Savepoints),
allowing users to persist periodic checkpoints.
Persistent checkpoints behave very much like regular periodic checkpoints
except the following differences:
1. They persist their meta data (like savepoints).
2. They are not discarded when the owning job fails permanently.
Furthermore, they can be configured to not be discarded when the job is
cancelled.
This means that if a job fails permanently the user will have a checkpoint
available to restore from. As an example think of the following scenario: a job
runs smoothly until it hits a bad record that it cannot handle. The current
behaviour will be that the job will try to recover, but it will hit the bad
record again and keep on failing. With persistent checkpoints, the user can
update the program to handle bad records and restore from the most recent
persistent checkpoints.
## CheckpointConfig
This adds the following `@PublicEvolving` methods to `CheckpointConfig`:
```
enablePersistentCheckpoints(String targetDirectory);
enablePersistentCheckpoints(String targetDirectory,
PersistentCheckpointCleanup cleanup)
```
The `PersistentCheckpointCleanup` defines how persistent checkpoints are
cleaned up when the owning job is cancelled. Since currently most streaming
jobs are stopped via cancellation, the default is to clean persistent
checkpoints up. The user can overwrite this behaviour via the enum.
## REST API
The REST API exposes the external-path of the most recent persistent
checkpoint via the REST API. This is also displayed in the web UI for the most
recent persistent checkpoint.

## Deprecate savepoint state backends (FLINK-4507)
Furthermore, the savepoint state backends have been removed and all
savepoints now go to files. The corresponding configuration keys have been
removed or deprecated:
`savepoints.state.backend.fs.dir` has been deprecated in favour of
`state.savepoints.dir`. `savepoints.state.backend` has been removed.
## Allow to specify custom savepoint directory (FLINK-4509)
The target directory for savepoints was configured per Flink configuration.
With this change, this can be overwritten:
```
bin/flink savepoint <jobId> [targetDirectory]
```
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/uce/flink 4512-persistent_checkpoints
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/2608.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2608
----
commit 004ba0b38ac2b75148910660242808c13746c444
Author: Ufuk Celebi <[email protected]>
Date: 2016-10-06T14:43:42Z
[FLINK-4512] [FLIP-10] Add option to persist periodic checkpoints
[FLINK-4509] [FLIP-10] Specify savepoint directory per savepoint
[FLINK-4507] [FLIP-10] Deprecate savepoint backend config
----
> Add option for persistent checkpoints
> -------------------------------------
>
> Key: FLINK-4512
> URL: https://issues.apache.org/jira/browse/FLINK-4512
> Project: Flink
> Issue Type: Sub-task
> Components: State Backends, Checkpointing
> Reporter: Ufuk Celebi
> Assignee: Ufuk Celebi
>
> Allow periodic checkpoints to be persisted by writing out their meta data.
> This is what we currently do for savepoints, but in the future checkpoints
> and savepoints are likely to diverge with respect to guarantees they give for
> updatability, etc.
> This means that the difference between persistent checkpoints and savepoints
> in the long term will be that persistent checkpoints can only be restored
> with the same job settings (like parallelism, etc.)
> Regular and persisted checkpoints should behave differently with respect to
> disposal in *globally* terminal job states (FINISHED, CANCELLED, FAILED):
> regular checkpoints are cleaned up in all of these cases whereas persistent
> checkpoints only on FINISHED. Maybe with the option to customize behaviour on
> CANCELLED or FAILED.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)