[
https://issues.apache.org/jira/browse/FLINK-5086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769671#comment-15769671
]
Roman Maier commented on FLINK-5086:
------------------------------------
Hi Till Rohrmann, I do not have a concrete plan yet.
I see reasonable suggestion Xiaogang Shi to clean dead files when the task
recovers.
Now I began to investigate how a chekpointing mechanism is implemented in Flink.
Then I proceed to implementation.
I am a novice in flink and work yet not very fast,
so If changes are needed urgently, in order not to confuse those who are
willing and able to implement this issue faster, maybe I should change a status
to unassigned.
If it is necessary to make - please let me know.
> Clean dead snapshot files produced by the tasks failing to acknowledge
> checkpoints
> ----------------------------------------------------------------------------------
>
> Key: FLINK-5086
> URL: https://issues.apache.org/jira/browse/FLINK-5086
> Project: Flink
> Issue Type: Bug
> Components: State Backends, Checkpointing
> Reporter: Xiaogang Shi
> Assignee: Roman Maier
>
> A task may fail when performing checkpoints. In that case, the task may have
> already copied some data to external storage. But since the task fails to
> send the state handler to {{CheckpointCoordinator}}, the copied data will not
> be deleted by {{CheckpointCoordinator}}.
> I think we must find a method to clean such dead snapshot data to avoid
> unlimited usage of external storage.
> One possible method is to clean these dead files when the task recovers. When
> a task recovers, {{CheckpointCoordinator}} will tell the task all the
> retained checkpoints. The task then can scan the external storage to delete
> all the snapshots not in these retained checkpoints.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)