[
https://issues.apache.org/jira/browse/FLINK-12144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813326#comment-16813326
]
Gyula Fora commented on FLINK-12144:
------------------------------------
yes, thanks I will close this issue :)
> Option to prefer checkpoints on recovery
> ----------------------------------------
>
> Key: FLINK-12144
> URL: https://issues.apache.org/jira/browse/FLINK-12144
> Project: Flink
> Issue Type: New Feature
> Components: Runtime / Checkpointing
> Reporter: Gyula Fora
> Assignee: vinoyang
> Priority: Trivial
>
> When a streaming job fails the getLatestCheckpoint() of the CheckpointStore
> is used to determine which checkpoint or savepoint is going to be used for
> recovery.
> This behaviour is perfectly fine for jobs with relatively small states or
> where there are no strong SLAs but it some cases it can be problematic.
> For jobs with a very large state size, the difference between recovery times
> from savepoints and checkpoints can be substantial to the point where it
> might break a use-case. So we would like to avoid ever recovering from a
> savepoint if a not too old checkpoint is also readily available.
> This cannot be avoided right now if a job fails after we took a savepoint
> maybe for backup purposes (maybe it is scheduled multiple times a day).
> I suggest we add a configuration option to allow the job to fall back to an
> earlier checkpoint (within maybe a certain age limit) even if there is a
> newer savepoint available to avoid lengthy downtimes.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)