[jira] [Commented] (FLINK-33288) Empty directory residue with appid name in HA(highly-available) related directory of hdfs, not cleaned

Xin Chen (Jira) Tue, 17 Oct 2023 00:42:04 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-33288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17776047#comment-17776047
 ]


Xin Chen commented on FLINK-33288:
----------------------------------

In the code, it can be seen that after the task is completed, there is an 
action to clear the data under the HA directory. If an exception occurs during 
the cleaning process, a warn-level log will be printed, which includes 'high 
availability StorageDir'.
 !screenshot-2.png! 

But in reality, `removeJob (jobId, cleanupJobState)` only deleted the blob 
subdirectory(/flink/recovery/application_1694077753088_0009/blob) of the 
appid-directory under that directory, as well as deleted znode and configmap in 
k8s, but there was no action to delete the parent 
directory(/flink/recovery/application_1694077753088_0009).

> Empty directory residue with appid name in HA(highly-available) related 
> directory of hdfs, not cleaned
> ------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-33288
>                 URL: https://issues.apache.org/jira/browse/FLINK-33288
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Configuration
>    Affects Versions: 1.16.2, 1.17.1
>            Reporter: Xin Chen
>            Priority: Major
>         Attachments: screenshot-1.png, screenshot-2.png
>
>
> When I submitted a large number of tasks in Flink-on-Yarn mode and 
> successfully executed, I unexpectedly found a large number of empty 
> directories left in the directory related to 'high availability.storageDir' 
> on hdfs, with appids as shown below. I believe this must be cleared! However, 
> after verification in the environments of 1.16.2 and 1.17.1, it was proven 
> that neither of them solved this problem.
> my flink-conf.yaml about 'high availability.storageDir':
> {code:java}
> high-availability.storageDir: hdfs://hdfsHACluster/flink/recovery
> {code}
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-33288) Empty directory residue with appid name in HA(highly-available) related directory of hdfs, not cleaned

Reply via email to