[
https://issues.apache.org/jira/browse/FLINK-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17116169#comment-17116169
]
Stephan Ewen commented on FLINK-15012:
--------------------------------------
I think there is a very difference between the working/temp directories and the
checkpoint directories.
The working/temp directories can be cleaned up after processes shut down,
because no data in them will ever be needed.
The checkpoint directories may contain retained checkpoints or savepoints that
are still relevant. I think we should not ever try to delete these with things
like "shutdown hooks".
I understand that job cancellation should remove the job's empty parent
checkpoint directories. That makes sense. And [~yunta] proposed an issue to fix
this.
I would question whether we should try and do anything about the
{{stop-cluster.sh}} behavior. This is forceful wiping of the cluster rather
than proper shutdown, so left-over data is to be expected. And, in my mind, the
caution to not accidentally delete a still-needed checkpoint is more important
than making the "hard stop" as nice as possible (cleanup wise).
> Checkpoint directory not cleaned up
> -----------------------------------
>
> Key: FLINK-15012
> URL: https://issues.apache.org/jira/browse/FLINK-15012
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Checkpointing
> Affects Versions: 1.9.1
> Reporter: Nico Kruber
> Assignee: Yun Tang
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.12.0
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> I started a Flink cluster with 2 TMs using {{start-cluster.sh}} and the
> following config (in addition to the default {{flink-conf.yaml}})
> {code:java}
> state.checkpoints.dir: file:///path/to/checkpoints/
> state.backend: rocksdb {code}
> After submitting a jobwith checkpoints enabled (every 5s), checkpoints show
> up, e.g.
> {code:java}
> bb969f842bbc0ecc3b41b7fbe23b047b/
> ├── chk-2
> │ ├── 238969e1-6949-4b12-98e7-1411c186527c
> │ ├── 2702b226-9cfc-4327-979d-e5508ab2e3d5
> │ ├── 4c51cb24-6f71-4d20-9d4c-65ed6e826949
> │ ├── e706d574-c5b2-467a-8640-1885ca252e80
> │ └── _metadata
> ├── shared
> └── taskowned {code}
> If I shut down the cluster via {{stop-cluster.sh}}, these files will remain
> on disk and not be cleaned up.
> In contrast, if I cancel the job, at least {{chk-2}} will be deleted, but
> still leaving the (empty) directories.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)