[
https://issues.apache.org/jira/browse/FLINK-5763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15878016#comment-15878016
]
ASF GitHub Bot commented on FLINK-5763:
---------------------------------------
Github user StephanEwen commented on the issue:
https://github.com/apache/flink/pull/3345
Very good change! Looked through it, nothing to complain about ;-)
Merging this...
> Make savepoints self-contained and relocatable
> ----------------------------------------------
>
> Key: FLINK-5763
> URL: https://issues.apache.org/jira/browse/FLINK-5763
> Project: Flink
> Issue Type: Improvement
> Components: State Backends, Checkpointing
> Reporter: Ufuk Celebi
> Assignee: Ufuk Celebi
>
> After a user has triggered a savepoint, a single savepoint file will be
> returned as a handle to the savepoint. A savepoint to {{<target>}} creates a
> savepoint file like {{<target>/savepoint-<randomSuffix>}}.
> This file contains the metadata of the corresponding checkpoint, but not the
> actual program state. While this works well for short term management
> (pause-and-resume a job), it makes it hard to manage savepoints over longer
> periods of time.
> h4. Problems
> h5. Scattered Checkpoint Files
> For file system based checkpoints (FsStateBackend, RocksDBStateBackend) this
> results in the savepoint referencing files from the checkpoint directory
> (usually different than <target>). For users, it is virtually impossible to
> tell which checkpoint files belong to a savepoint and which are lingering
> around. This can easily lead to accidentally invalidating a savepoint by
> deleting checkpoint files.
> h5. Savepoints Not Relocatable
> Even if a user is able to figure out which checkpoint files belong to a
> savepoint, moving these files will invalidate the savepoint as well, because
> the metadata file references absolute file paths.
> h5. Forced to Use CLI for Disposal
> Because of the scattered files, the user is in practice forced to use Flink’s
> CLI to dispose a savepoint. This should be possible to handle in the scope of
> the user’s environment via a file system delete operation.
> h4. Proposal
> In order to solve the described problems, savepoints should contain all their
> state, both metadata and program state, inside a single directory.
> Furthermore the metadata must only hold relative references to the checkpoint
> files. This makes it obvious which files make up the state of a savepoint and
> it is possible to move savepoints around by moving the savepoint directory.
> h5. Desired File Layout
> Triggering a savepoint to {{<target>}} creates a directory as follows:
> {code}
> <target>/savepoint-<jobId>-<randomSuffix>
> +-- _metadata
> +-- data-<randomSuffix> [1 or more]
> {code}
> We include the JobID in the savepoint directory name in order to give some
> hints about which job a savepoint belongs to.
> h5. CLI
> - Trigger: When triggering a savepoint to {{<target>}} the savepoint
> directory will be returned as the handle to the savepoint.
> - Restore: Users can restore by pointing to the directory or the _metadata
> file. The data files should be required to be in the same directory as the
> _metadata file.
> - Dispose: The disposal command should be deprecated and eventually removed.
> While deprecated, disposal can happen by specifying the directory or the
> _metadata file (same as restore).
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)