[ https://issues.apache.org/jira/browse/FLINK-24149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Feifan Wang updated FLINK-24149: -------------------------------- Summary: Make checkpoint self-contained and relocatable (was: Make checkpoint relocatable) > Make checkpoint self-contained and relocatable > ---------------------------------------------- > > Key: FLINK-24149 > URL: https://issues.apache.org/jira/browse/FLINK-24149 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing > Reporter: Feifan Wang > Priority: Major > Labels: pull-request-available > Attachments: image-2021-09-08-17-06-31-560.png, > image-2021-09-08-17-10-28-240.png, image-2021-09-08-17-55-46-898.png, > image-2021-09-08-18-01-03-176.png > > > h3. 1. Backgroud > FLINK-5763 proposal make savepoint relocatable, checkpoint has similar > requirements. For example, to migrate jobs to other HDFS clusters, although > it can be achieved through a savepoint, but we prefer to use persistent > checkpoints, especially RocksDBStateBackend incremental checkpoints have > better performance than savepoint during snapshot and restore. > > FLINK-8531 standardized directory layout : > {code:java} > /user-defined-checkpoint-dir > | > + 1b080b6e710aabbef8993ab18c6de98b (job's ID) > | > + --shared/ > + --taskowned/ > + --chk-00001/ > + --chk-00002/ > + --chk-00003/ > ... > {code} > * State backend will create a subdirectory with the job's ID that will > contain the actual checkpoints, such as: > user-defined-checkpoint-dir/1b080b6e710aabbef8993ab18c6de98b/ > * Each checkpoint individually will store all its files in a subdirectory > that includes the checkpoint number, such as: > user-defined-checkpoint-dir/1b080b6e710aabbef8993ab18c6de98b/chk-00003/ > * Files shared between checkpoints will be stored in the shared/ directory > in the same parent directory as the separate checkpoint directory, such as: > user-defined-checkpoint-dir/1b080b6e710aabbef8993ab18c6de98b/shared/ > * Similar to shared files, files owned strictly by tasks will be stored in > the taskowned/ directory in the same parent directory as the separate > checkpoint directory, such as: > user-defined-checkpoint-dir/1b080b6e710aabbef8993ab18c6de98b/taskowned/ > h3. Proposal > Since the individually checkpoint directory does not contain complete state > data, we cannot make it relocatable, but its parent directory can. The only > work left is make the metadata file references relative file paths. > I proposal make these changes to _*FsCheckpointStateOutputStream*_ : > * introduce _*checkpointDirectory*_ field, and remove *_allowRelativePaths_* > field > * introduce *_entropyInjecting_* field > * *_closeAndGetHandle()_* return _*RelativeFileStateHandle*_ with relative > path base on _*checkpointDirectory*_ (except entropy injecting file system) > [~yunta], [~trohrmann] , I verified this in our environment , and submitted a > pull request to accomplish this feature. Please help evaluate whether it is > appropriate. > -- This message was sent by Atlassian Jira (v8.3.4#803005)