[
https://issues.apache.org/jira/browse/FLINK-23317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388768#comment-17388768
]
Nicolaus Weidner edited comment on FLINK-23317 at 7/28/21, 2:23 PM:
--------------------------------------------------------------------
[~dmvk] My plan would be as follows:
* Instead of the list of all checkpoints, keep an object containing only up to
two checkpoints in state (the latest checkpoint/savepoint, and possibly one
additional checkpoint if the latest one is a savepoint)
* replace/rename {{recover() with synchronizeWithStore() }}(name not
final...). This method would retrieve the checkpoint state handles, check
whether our local checkpoints are still up to date, and then download the most
recent ones in case they are not up to date
* When getAllCheckpoints() is called, they all have to be downloaded so we can
register shared state (maybe except the one/two we have in memory)
One additional idea I had was to keep the list of state handles in memory as
well so we don't have to query it in getAllCheckpoints, but I am not sure this
is worth it (I don't see any usage of
CheckpointCoordinator#getSuccessfulCheckpoints, which is in turn the only usage
of that method apart from recovery).
Does this fit what you had in mind in
https://issues.apache.org/jira/browse/FLINK-22483?focusedCommentId=17388551&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17388551?
was (Author: nicolaus weidner):
[~dmvk] My plan would be as follows:
* Instead of the list of all checkpoints, keep an object containing only up to
two checkpoints in state (the latest checkpoint/savepoint, and possibly one
additional checkpoint if the latest one is a savepoint)
* replace/rename {{recover() }}with {{synchronizeWithStore() }}(name not
final...). This method would retrieve the checkpoint state handles, check
whether our local checkpoints are still up to date, and then download the most
recent ones in case they are not up to date
* When {{getAllCheckpoints() }}is called, they all have to be downloaded so we
can register shared state (maybe except the one/two we have in memory)
One additional idea I had was to keep the list of state handles in memory as
well so we don't have to query it in {{getAllCheckpoints, }}but I am not sure
this is worth it (I don't see any usage of
{{CheckpointCoordinator#getSuccessfulCheckpoints, }}which is in turn the only
usage of that method apart from recovery).
Does this fit what you had in mind in
https://issues.apache.org/jira/browse/FLINK-22483?focusedCommentId=17388551&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17388551?
> Only keep the latest checkpoint in CompletedCheckpointStore
> -----------------------------------------------------------
>
> Key: FLINK-23317
> URL: https://issues.apache.org/jira/browse/FLINK-23317
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Coordination
> Reporter: David Morávek
> Assignee: Nicolaus Weidner
> Priority: Minor
>
> Issue based on the discussion from FLINK-22483
> We can lower the memory footprint of CompletedCheckpointStore by keeping only
> the latest checkpoint / savepoint, that will be used for recovery. We need to
> respect `preferCheckpoint`, but the sentiment of this stays the same.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)