[
https://issues.apache.org/jira/browse/FLINK-24149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17410315#comment-17410315
]
Yun Tang commented on FLINK-24149:
----------------------------------
[~Feifan Wang] The incremental checkpoint might have different path prefix,
such as one shared file is from
{{/user-defined-checkpoint-dir/job_id_1/shared}} and another one is from
{{/user-defined-checkpoint-dir/job_id_2/shared}}. Could this still satisify the
relocatable case?
> Make checkpoint relocatable
> ---------------------------
>
> Key: FLINK-24149
> URL: https://issues.apache.org/jira/browse/FLINK-24149
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Checkpointing
> Reporter: Feifan Wang
> Priority: Major
> Labels: pull-request-available
>
> h3. Backgroud
> FLINK-5763 proposal make savepoint relocatable, checkpoint has similar
> requirements. For example, to migrate jobs to other HDFS clusters, although
> it can be achieved through a savepoint, but we prefer to use persistent
> checkpoints, especially RocksDBStateBackend incremental checkpoints have
> better performance than savepoint during snapshot and restore.
>
> FLINK-8531 standardized directory layout :
> {code:java}
> /user-defined-checkpoint-dir
> |
> + 1b080b6e710aabbef8993ab18c6de98b (job's ID)
> |
> + --shared/
> + --taskowned/
> + --chk-00001/
> + --chk-00002/
> + --chk-00003/
> ...
> {code}
> * State backend will create a subdirectory with the job's ID that will
> contain the actual checkpoints, such as:
> user-defined-checkpoint-dir/1b080b6e710aabbef8993ab18c6de98b/
> * Each checkpoint individually will store all its files in a subdirectory
> that includes the checkpoint number, such as:
> user-defined-checkpoint-dir/1b080b6e710aabbef8993ab18c6de98b/chk-00003/
> * Files shared between checkpoints will be stored in the shared/ directory
> in the same parent directory as the separate checkpoint directory, such as:
> user-defined-checkpoint-dir/1b080b6e710aabbef8993ab18c6de98b/shared/
> * Similar to shared files, files owned strictly by tasks will be stored in
> the taskowned/ directory in the same parent directory as the separate
> checkpoint directory, such as:
> user-defined-checkpoint-dir/1b080b6e710aabbef8993ab18c6de98b/taskowned/
> h3. Proposal
> Since the individually checkpoint directory does not contain complete state
> data, we cannot make it relocatable, but its parent directory can. The only
> work left is make the metadata file references relative file paths.
> I proposal make these changes to _*FsCheckpointStateOutputStream*_ :
> * introduce _*checkpointDirectory*_ field, and remove *_allowRelativePaths_*
> field
> * introduce *_entropyInjecting_* field
> * *_closeAndGetHandle()_* return _*RelativeFileStateHandle*_ with relative
> path base on _*checkpointDirectory*_ (except entropy injecting file system)
> [~yunta], [~trohrmann] , I verified this in our environment , and submitted a
> pull request to accomplish this feature. Please help evaluate whether it is
> appropriate.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)