[
https://issues.apache.org/jira/browse/FLINK-33288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17776047#comment-17776047
]
Xin Chen commented on FLINK-33288:
----------------------------------
In the code, it can be seen that after the task is completed, there is an
action to clear the data under the HA directory. If an exception occurs during
the cleaning process, a warn-level log will be printed, which includes 'high
availability StorageDir'.
!screenshot-2.png!
But in reality, `removeJob (jobId, cleanupJobState)` only deleted the blob
subdirectory(/flink/recovery/application_1694077753088_0009/blob) of the
appid-directory under that directory, as well as deleted znode and configmap in
k8s, but there was no action to delete the parent
directory(/flink/recovery/application_1694077753088_0009).
> Empty directory residue with appid name in HA(highly-available) related
> directory of hdfs, not cleaned
> ------------------------------------------------------------------------------------------------------
>
> Key: FLINK-33288
> URL: https://issues.apache.org/jira/browse/FLINK-33288
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Configuration
> Affects Versions: 1.16.2, 1.17.1
> Reporter: Xin Chen
> Priority: Major
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> When I submitted a large number of tasks in Flink-on-Yarn mode and
> successfully executed, I unexpectedly found a large number of empty
> directories left in the directory related to 'high availability.storageDir'
> on hdfs, with appids as shown below. I believe this must be cleared! However,
> after verification in the environments of 1.16.2 and 1.17.1, it was proven
> that neither of them solved this problem.
> my flink-conf.yaml about 'high availability.storageDir':
> {code:java}
> high-availability.storageDir: hdfs://hdfsHACluster/flink/recovery
> {code}
> !screenshot-1.png!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)