[
https://issues.apache.org/jira/browse/FLINK-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Flink Jira Bot updated FLINK-10855:
-----------------------------------
Labels: auto-deprioritized-major auto-unassigned (was: auto-unassigned
stale-major)
Priority: Minor (was: Major)
This issue was labeled "stale-major" 7 ago and has not received any updates so
it is being deprioritized. If this ticket is actually Major, please raise the
priority and ask a committer to assign you the issue or revive the public
discussion.
> CheckpointCoordinator does not delete checkpoint directory of late/failed
> checkpoints
> -------------------------------------------------------------------------------------
>
> Key: FLINK-10855
> URL: https://issues.apache.org/jira/browse/FLINK-10855
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Checkpointing
> Affects Versions: 1.5.5, 1.6.2, 1.7.0
> Reporter: Till Rohrmann
> Priority: Minor
> Labels: auto-deprioritized-major, auto-unassigned
>
> In case that an acknowledge checkpoint message is late or a checkpoint cannot
> be acknowledged, we discard the subtask state in the
> {{CheckpointCoordinator}}. What's not happening in this case is that we
> delete the parent directory of the checkpoint. This only happens when we
> dispose a {{PendingCheckpoint#dispose}}.
> Due to this behaviour it can happen that a checkpoint fails (e.g. a task not
> being ready) and we delete the checkpoint directory. Next another task writes
> its checkpoint data to the checkpoint directory (thereby creating it again)
> and sending an acknowledge message back to the {{CheckpointCoordinator}}. The
> {{CheckpointCoordinator}} will realize that there is no longer a
> {{PendingCheckpoint}} and will discard the sub task state. This will remove
> the state files from the checkpoint directory but will leave the checkpoint
> directory untouched.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)