[
https://issues.apache.org/jira/browse/FLINK-34557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17822447#comment-17822447
]
tanliang commented on FLINK-34557:
----------------------------------
This question may not be very urgent, but I hope someone can discuss it and
raise relevant questions for communication
> When the Flink task ends in application mode, there may be issues with the
> Znode and HDFS files not being deleted
> -----------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-34557
> URL: https://issues.apache.org/jira/browse/FLINK-34557
> Project: Flink
> Issue Type: Improvement
> Components: Deployment / YARN, Runtime / Task
> Affects Versions: 1.17.0, 1.16.2
> Reporter: tanliang
> Priority: Major
> Attachments: image-2024-03-01-15-38-48-396.png,
> image-2024-03-01-15-39-13-953.png, image-2024-03-01-15-39-39-524.png
>
>
> In Flink 1.16.2, we all use application mode to submit tasks to Yarn.
> However, there are several situations during use that result in Znode not
> being deleted and some files on HDFS not being deleted. These should be
> deleted after the task is stopped, otherwise it may cause some resource
> occupancy problems. Below are the several situations I have encountered:
> # After the Flink task is submitted to the cluster, if there is a conflict
> or missing jar package, the task will be restarted multiple times by Yarn and
> ultimately fail to end. At this point, it will be found that the Znode
> persists, and there are files with corresponding appids in the '/.flink'
> directory and '/flink/recovery' directory in HDFS;
> # When using the yarn kill command to kill a task, the task ends directly
> and the final state is killed, with the final result being the same as the
> first one;
> # When the Flink task is disconnected from zk (we will not analyze the
> specific reason for the disconnection), if zk is disconnected from the jm
> container, the task will hang and be pulled back by yarn. When the last
> disconnection occurs, the task will eventually end and the same result as
> above will appear;
> !image-2024-03-01-15-38-48-396.png|width=877,height=171!
> !image-2024-03-01-15-39-13-953.png|width=882,height=174!
>
> !image-2024-03-01-15-39-39-524.png|width=1001,height=67!
>
> *Add:*
> Through consulting with the community and other colleagues, we found that the
> community had previously raised the issue of Znode not being deleted. Later,
> by adding the closeAndCleanupAllData# method, it was uniformly deleted at the
> end of a highly available cluster. However, in the aforementioned situations,
> there are still issues with file and data residue. Among them, when using the
> yarn kill command, after successfully submitting a task to the cluster, Flink
> would indicate through console logs that there would be HDFS file residue
> after successfully submitting tasks to the cluster, however, I don't
> understand why the community did not improve this and instead retained the
> existence of this situation. At the same time, we believe that Znode residue
> should not exist, regardless of the task status, it must be cleaned up after
> stopping the task
--
This message was sent by Atlassian Jira
(v8.20.10#820010)