Xiaogang Shi created FLINK-5086:
-----------------------------------
Summary: Clean dead snapshot files produced by the tasks failing
to acknowledge checkpoints
Key: FLINK-5086
URL: https://issues.apache.org/jira/browse/FLINK-5086
Project: Flink
Issue Type: Bug
Components: State Backends, Checkpointing
Reporter: Xiaogang Shi
A task may fail when performing checkpoints. In that case, the task may have
already copied some data to external storage. But since the task fails to send
the state handler to {{CheckpointCoordinator}}, the copied data will not be
deleted by {{CheckpointCoordinator}}.
I think we must find a method to clean such dead snapshot data to avoid
unlimited usage of external storage.
One possible method is to clean these dead files when the task recovers. When a
task recovers, {{CheckpointCoordinator}} will tell the task all the retained
checkpoints. The task then can scan the external storage to delete all the
snapshots not in these retained checkpoints.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)