[
https://issues.apache.org/jira/browse/FLINK-10751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16716654#comment-16716654
]
ASF GitHub Bot commented on FLINK-10751:
----------------------------------------
uce commented on issue #7006: [FLINK-10751] [runtime] Retain checkpoints on
suspension
URL: https://github.com/apache/flink/pull/7006#issuecomment-446132928
Closing this as it should be properly addressed at some other point in time.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Checkpoints should be retained when job reaches suspended state
> ---------------------------------------------------------------
>
> Key: FLINK-10751
> URL: https://issues.apache.org/jira/browse/FLINK-10751
> Project: Flink
> Issue Type: Bug
> Components: Distributed Coordination
> Affects Versions: 1.6.2, 1.7.0
> Reporter: Ufuk Celebi
> Assignee: Ufuk Celebi
> Priority: Minor
> Labels: pull-request-available
>
> {{CheckpointProperties}} define in which terminal job status a checkpoint
> should be disposed.
> I've noticed that the properties for {{CHECKPOINT_NEVER_RETAINED}},
> {{CHECKPOINT_RETAINED_ON_FAILURE}} prescribe checkpoint disposal in (locally)
> terminal job status {{SUSPENDED}}.
> Since a job reaches the {{SUSPENDED}} state when its {{JobMaster}} looses
> leadership, this would result in the checkpoint to be cleaned up and not
> being available for recovery by the new leader. Therefore, we should rather
> retain checkpoints when reachingĀ job status {{SUSPENDED}}.
> *BUT:* Because we special case this terminal state in the only highly
> available {{CompletedCheckpointStore}} implementation (seeĀ
> [ZooKeeperCompletedCheckpointStore|https://github.com/apache/flink/blob/e7ac3ba/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/ZooKeeperCompletedCheckpointStore.java#L315])
> and don't use regular checkpoint disposal, this issue has not surfaced yet.
> I think we should proactively fix the properties to indicate to retain
> checkpoints in {{SUSPENDED}} state. We might actually completely remove this
> case since with this change, all properties will indicate to retain on
> suspension.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)