[
https://issues.apache.org/jira/browse/FLINK-11159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16818847#comment-16818847
]
Till Rohrmann commented on FLINK-11159:
---------------------------------------
I think savepoints were excluded from recoveries for some time with FLINK-6328.
However, since savepoints also manifest side effects to external systems (as of
now), we reverted this change with FLINK-10354.
There was some discussion about having savepoints with different flush
semantics to solve this problem. Normal savepoints don't flush side-effects
whereas stop-with-savepoint will trigger the manifestation of side effects
(couldn't find the thread).
As long as we don't have this feature, I fear that we can only offer an option
to opt out of the savepoint usage for recovery. Otherwise we might break
exactly once processing guarantees for everyone who uses savepoints and
manifests side-effects.
> Allow configuration whether to fall back to savepoints for restore
> ------------------------------------------------------------------
>
> Key: FLINK-11159
> URL: https://issues.apache.org/jira/browse/FLINK-11159
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Checkpointing
> Affects Versions: 1.5.5, 1.6.2, 1.7.0
> Reporter: Nico Kruber
> Assignee: vinoyang
> Priority: Major
>
> Ever since FLINK-3397, upon failure, Flink would restart from the latest
> checkpoint/savepoint which ever is more recent. With the introduction of
> local recovery and the knowledge that a RocksDB checkpoint restore would just
> copy the files, it may be time to re-consider / making this configurable:
> In certain situations, it may be faster to restore from the latest checkpoint
> only (even if there is a more recent savepoint) and reprocess the data
> between. On the downside, though, that may not be correct because that might
> break side effects if the savepoint was the latest one, e.g. consider this
> chain: {{chk1 -> chk2 -> sp … restore chk2 -> …}}. Then all side effects
> between {{chk2 -> sp}} would be reproduced.
> Making this configurable will allow the user to set whatever he needs / can
> to get the lowest recovery time in Flink.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)