[
https://issues.apache.org/jira/browse/FLINK-33288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17776158#comment-17776158
]
Matthias Pohl commented on FLINK-33288:
---------------------------------------
Hi [~xinchen147], thanks for reporting this issue and [~Zhanghao Chen] for
jumping into the discussion.
The job cleanup should deal with all the artifacts of a job, i.e. all its
artifacts should be removed if the job reaches a globally-terminal state (which
means that it wouldn't be recovered if the Flink cluster restarts). This
doesn't apply to the scenario where {{job-result-store.delete-on-commit}} is
disabled because it would leave entry files in the {{JobResultStore}} folder.
In that case, Flink has handed over the ownership of those entry files to the
user (i.e. he/she is in charge of (re)moving the files). This is desired
behavior to enable recovery in application-mode clusters.
This keeps me wondering whether we should do a recursive deletion of empty
folders within the HA storage folder. That would cover the case where all the
artifacts are removed (when all jobs reached globally-terminal state) and
protect the scenarios where artifacts are retained (with
{{job-result-store.delete-on-commit}} being disabled or some jobs not having
reached globally-terminal state or some other issue occurred that prevented the
deletion of artifacts). WDYT?
> Empty directory residue with appid name in HA(highly-available) related
> directory of hdfs, not cleaned
> ------------------------------------------------------------------------------------------------------
>
> Key: FLINK-33288
> URL: https://issues.apache.org/jira/browse/FLINK-33288
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Configuration
> Affects Versions: 1.16.2, 1.17.1
> Reporter: Xin Chen
> Priority: Major
> Attachments: image-2023-10-17-16-43-07-859.png, screenshot-1.png,
> screenshot-2.png
>
>
> When I submitted a large number of tasks in Flink-on-Yarn mode and
> successfully executed, I unexpectedly found a large number of empty
> directories left in the directory related to 'high availability.storageDir'
> on hdfs, with appids as shown below. I believe this must be cleared! However,
> after verification in the environments of 1.16.2 and 1.17.1, it was proven
> that neither of them solved this problem.
> my flink-conf.yaml about 'high availability.storageDir':
> {code:java}
> high-availability.storageDir: hdfs://hdfsHACluster/flink/recovery
> {code}
> !screenshot-1.png!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)