Stephan Ewen created FLINK-4818:
-----------------------------------
Summary: RestartStrategy should track how many failed restore
attempts the same checkpoint has and fall back to earlier checkpoints
Key: FLINK-4818
URL: https://issues.apache.org/jira/browse/FLINK-4818
Project: Flink
Issue Type: Sub-task
Components: Distributed Coordination
Reporter: Stephan Ewen
The restart strategies can use the exception information from FLINK-4816 to
keep track of how often a checkpoint restore has failed. After a certain number
of consecutive failures, they should take earlier completed checkpoints as
recovery points.
It is up to discussion whether the restart strategies are the right place to
implement that, or whether this is an orthogonal feature that should go into
the checkpoint coordinator (which knows how many checkpoints are available) or
a separate class altogether.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)