[
https://issues.apache.org/jira/browse/FLINK-11159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16746405#comment-16746405
]
vinoyang commented on FLINK-11159:
----------------------------------
If the user enables this option, we can think of it as a "dynamic (not
periodic) checkpoint". It enables the “pause/resume” function in a faster and
more efficient way. Of course, the role that savepoint itself has (such as
upgraded versions, etc.) still exists. I think we really need this feature. If
you agree, I am willing to provide a design document for this? What do you
think about the idea? cc [~till.rohrmann] [~Zentol]
> Allow configuration whether to fall back to savepoints for restore
> ------------------------------------------------------------------
>
> Key: FLINK-11159
> URL: https://issues.apache.org/jira/browse/FLINK-11159
> Project: Flink
> Issue Type: Improvement
> Components: State Backends, Checkpointing
> Affects Versions: 1.5.5, 1.6.2, 1.7.0
> Reporter: Nico Kruber
> Assignee: vinoyang
> Priority: Major
>
> Ever since FLINK-3397, upon failure, Flink would restart from the latest
> checkpoint/savepoint which ever is more recent. With the introduction of
> local recovery and the knowledge that a RocksDB checkpoint restore would just
> copy the files, it may be time to re-consider / making this configurable:
> In certain situations, it may be faster to restore from the latest checkpoint
> only (even if there is a more recent savepoint) and reprocess the data
> between. On the downside, though, that may not be correct because that might
> break side effects if the savepoint was the latest one, e.g. consider this
> chain: {{chk1 -> chk2 -> sp … restore chk2 -> …}}. Then all side effects
> between {{chk2 -> sp}} would be reproduced.
> Making this configurable will allow the user to set whatever he needs / can
> to get the lowest recovery time in Flink.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)