[ 
https://issues.apache.org/jira/browse/FLINK-24149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17411934#comment-17411934
 ] 

Feifan Wang commented on FLINK-24149:
-------------------------------------

[~yunta] I did think about this issue too simplistically at first, it is indeed 
worthy of careful discussion.

[~pnowojski] , thanks for reply. In our team, most job use RocksDBStateBackend 
and Incremental checkpoint, we prefer retained checkpoint (incremental) rather 
than savepoint  for some reasons :
 # Savepoint cost much longer time then incremental checkpoint in jobs with 
large state. The figure below is a job in our production environment, it takes 
nearly 7 minutes to complete a savepoint, while checkpoint only takes a few 
seconds.( checkpoint after savepoint case longer time is a problem described in 
FLINK-23949)
!image-2021-09-08-17-55-46-898.png|width=1172,height=261!
 # Savepoint causes excessive cpu usage. The figure below shows the CPU usage 
of the same job in the above figure :
!image-2021-09-08-18-01-03-176.png|width=1009,height=289!
 # {color:#172b4d}The savepoint of rocksdbstatebackend may cause excessive 
native memory usage and eventually cause the TaskManager process memory usage 
to exceed the limit. (We did not further investigate the cause and did not try 
to reproduce the problem on other large state jobs, but only increased the 
overhead memory. So this reason may not be so conclusive. ){color}
 # Migrate job to another hdfs cluster may occur when the currently running 
cluster fails. In this case, there is no chance to trigger a savepoint. And 
trigger savepoint periodically has too much influence on the job.

> Make checkpoint relocatable
> ---------------------------
>
>                 Key: FLINK-24149
>                 URL: https://issues.apache.org/jira/browse/FLINK-24149
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Checkpointing
>            Reporter: Feifan Wang
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: image-2021-09-08-17-06-31-560.png, 
> image-2021-09-08-17-10-28-240.png, image-2021-09-08-17-55-46-898.png, 
> image-2021-09-08-18-01-03-176.png
>
>
> h3. Backgroud
> FLINK-5763 proposal make savepoint relocatable, checkpoint has similar 
> requirements. For example, to migrate jobs to other HDFS clusters, although 
> it can be achieved through a savepoint, but we prefer to use persistent 
> checkpoints, especially RocksDBStateBackend incremental checkpoints have 
> better performance than savepoint during snapshot and restore.
>  
> FLINK-8531 standardized directory layout :
> {code:java}
> /user-defined-checkpoint-dir
>     |
>     + 1b080b6e710aabbef8993ab18c6de98b (job's ID)
>         |
>         + --shared/
>         + --taskowned/
>         + --chk-00001/
>         + --chk-00002/
>         + --chk-00003/
>         ...
> {code}
>  * State backend will create a subdirectory with the job's ID that will 
> contain the actual checkpoints, such as: 
> user-defined-checkpoint-dir/1b080b6e710aabbef8993ab18c6de98b/
>  * Each checkpoint individually will store all its files in a subdirectory 
> that includes the checkpoint number, such as: 
> user-defined-checkpoint-dir/1b080b6e710aabbef8993ab18c6de98b/chk-00003/
>  * Files shared between checkpoints will be stored in the shared/ directory 
> in the same parent directory as the separate checkpoint directory, such as: 
> user-defined-checkpoint-dir/1b080b6e710aabbef8993ab18c6de98b/shared/
>  * Similar to shared files, files owned strictly by tasks will be stored in 
> the taskowned/ directory in the same parent directory as the separate 
> checkpoint directory, such as: 
> user-defined-checkpoint-dir/1b080b6e710aabbef8993ab18c6de98b/taskowned/
> h3. Proposal
> Since the individually checkpoint directory does not contain complete state 
> data, we cannot make it relocatable, but its parent directory can. The only 
> work left is make the metadata file references relative file paths.
> I proposal make these changes to _*FsCheckpointStateOutputStream*_ :
>  * introduce _*checkpointDirectory*_ field, and remove *_allowRelativePaths_* 
> field
>  * introduce *_entropyInjecting_* field
>  * *_closeAndGetHandle()_* return _*RelativeFileStateHandle*_ with relative 
> path base on _*checkpointDirectory*_ (except entropy injecting file system)
> [~yunta], [~trohrmann] , I verified this in our environment , and submitted a 
> pull request to accomplish this feature. Please help evaluate whether it is 
> appropriate.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to