[
https://issues.apache.org/jira/browse/FLINK-3397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15361248#comment-15361248
]
Stephan Ewen commented on FLINK-3397:
-------------------------------------
To make sure everyone is on the same page: The system currently resumes
failures from the savepoint, if the program was started from that savepoint.
The remaining issue is the case where a savepoint was taken, but not resumed
from. In that case, it is not used for recovery.
To get this right, we should have a discussion about the general relationship
between checkpoints and savepoints. This would warrant a design doc in my
opinion, before creating any code.
> Failed streaming jobs should fall back to the most recent checkpoint/savepoint
> ------------------------------------------------------------------------------
>
> Key: FLINK-3397
> URL: https://issues.apache.org/jira/browse/FLINK-3397
> Project: Flink
> Issue Type: Improvement
> Components: Streaming
> Affects Versions: 1.0.0
> Reporter: Gyula Fora
> Priority: Minor
>
> The current fallback behaviour in case of a streaming job failure is slightly
> counterintuitive:
> If a job fails it will fall back to the most recent checkpoint (if any) even
> if there were more recent savepoint taken. This means that savepoints are not
> regarded as checkpoints by the system only points from where a job can be
> manually restarted.
> I suggest to change this so that savepoints are also regarded as checkpoints
> in case of a failure and they will also be used to automatically restore the
> streaming job.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)