[jira] [Comment Edited] (FLINK-23317) Only keep the latest checkpoint in CompletedCheckpointStore

Nicolaus Weidner (Jira) Wed, 28 Jul 2021 07:24:06 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-23317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388768#comment-17388768
 ]


Nicolaus Weidner edited comment on FLINK-23317 at 7/28/21, 2:23 PM:
--------------------------------------------------------------------

[~dmvk] My plan would be as follows:
 * Instead of the list of all checkpoints, keep an object containing only up to 
two checkpoints in state (the latest checkpoint/savepoint, and possibly one 
additional checkpoint if the latest one is a savepoint)
 * replace/rename {{recover() with synchronizeWithStore() }}(name not 
final...). This method would retrieve the checkpoint state handles, check 
whether our local checkpoints are still up to date, and then download the most 
recent ones in case they are not up to date
 * When getAllCheckpoints() is called, they all have to be downloaded so we can 
register shared state (maybe except the one/two we have in memory)

One additional idea I had was to keep the list of state handles in memory as 
well so we don't have to query it in getAllCheckpoints, but I am not sure this 
is worth it (I don't see any usage of 
CheckpointCoordinator#getSuccessfulCheckpoints, which is in turn the only usage 
of that method apart from recovery).

Does this fit what you had in mind in 
https://issues.apache.org/jira/browse/FLINK-22483?focusedCommentId=17388551&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17388551?


was (Author: nicolaus weidner):
[~dmvk] My plan would be as follows:
 * Instead of the list of all checkpoints, keep an object containing only up to 
two checkpoints in state (the latest checkpoint/savepoint, and possibly one 
additional checkpoint if the latest one is a savepoint)
 * replace/rename {{recover() }}with {{synchronizeWithStore() }}(name not 
final...). This method would retrieve the checkpoint state handles, check 
whether our local checkpoints are still up to date, and then download the most 
recent ones in case they are not up to date
 * When {{getAllCheckpoints() }}is called, they all have to be downloaded so we 
can register shared state (maybe except the one/two we have in memory)

One additional idea I had was to keep the list of state handles in memory as 
well so we don't have to query it in {{getAllCheckpoints, }}but I am not sure 
this is worth it (I don't see any usage of 
{{CheckpointCoordinator#getSuccessfulCheckpoints, }}which is in turn the only 
usage of that method apart from recovery).

Does this fit what you had in mind in 
https://issues.apache.org/jira/browse/FLINK-22483?focusedCommentId=17388551&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17388551?

> Only keep the latest checkpoint in CompletedCheckpointStore
> -----------------------------------------------------------
>
>                 Key: FLINK-23317
>                 URL: https://issues.apache.org/jira/browse/FLINK-23317
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination
>            Reporter: David Morávek
>            Assignee: Nicolaus Weidner
>            Priority: Minor
>
> Issue based on the discussion from FLINK-22483
> We can lower the memory footprint of CompletedCheckpointStore by keeping only 
> the latest checkpoint / savepoint, that will be used for recovery. We need to 
> respect `preferCheckpoint`, but the sentiment of this stays the same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (FLINK-23317) Only keep the latest checkpoint in CompletedCheckpointStore

Reply via email to