[ 
https://issues.apache.org/jira/browse/FLINK-34557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17822447#comment-17822447
 ] 

tanliang commented on FLINK-34557:
----------------------------------

This question may not be very urgent, but I hope someone can discuss it and 
raise relevant questions for communication

> When the Flink task ends in application mode, there may be issues with the 
> Znode and HDFS files not being deleted
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-34557
>                 URL: https://issues.apache.org/jira/browse/FLINK-34557
>             Project: Flink
>          Issue Type: Improvement
>          Components: Deployment / YARN, Runtime / Task
>    Affects Versions: 1.17.0, 1.16.2
>            Reporter: tanliang
>            Priority: Major
>         Attachments: image-2024-03-01-15-38-48-396.png, 
> image-2024-03-01-15-39-13-953.png, image-2024-03-01-15-39-39-524.png
>
>
> In Flink 1.16.2, we all use application mode to submit tasks to Yarn. 
> However, there are several situations during use that result in Znode not 
> being deleted and some files on HDFS not being deleted. These should be 
> deleted after the task is stopped, otherwise it may cause some resource 
> occupancy problems. Below are the several situations I have encountered:
>  # After the Flink task is submitted to the cluster, if there is a conflict 
> or missing jar package, the task will be restarted multiple times by Yarn and 
> ultimately fail to end. At this point, it will be found that the Znode 
> persists, and there are files with corresponding appids in the '/.flink' 
> directory and '/flink/recovery' directory in HDFS;
>  # When using the yarn kill command to kill a task, the task ends directly 
> and the final state is killed, with the final result being the same as the 
> first one;
>  # When the Flink task is disconnected from zk (we will not analyze the 
> specific reason for the disconnection), if zk is disconnected from the jm 
> container, the task will hang and be pulled back by yarn. When the last 
> disconnection occurs, the task will eventually end and the same result as 
> above will appear;
> !image-2024-03-01-15-38-48-396.png|width=877,height=171!
> !image-2024-03-01-15-39-13-953.png|width=882,height=174!
>  
> !image-2024-03-01-15-39-39-524.png|width=1001,height=67!
>  
> *Add:*
> Through consulting with the community and other colleagues, we found that the 
> community had previously raised the issue of Znode not being deleted. Later, 
> by adding the closeAndCleanupAllData# method, it was uniformly deleted at the 
> end of a highly available cluster. However, in the aforementioned situations, 
> there are still issues with file and data residue. Among them, when using the 
> yarn kill command, after successfully submitting a task to the cluster, Flink 
> would indicate through console logs that there would be HDFS file residue 
> after successfully submitting tasks to the cluster, however, I don't 
> understand why the community did not improve this and instead retained the 
> existence of this situation. At the same time, we believe that Znode residue 
> should not exist, regardless of the task status, it must be cleaned up after 
> stopping the task



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to