[
https://issues.apache.org/jira/browse/FLINK-19171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17193285#comment-17193285
]
Yi Tang commented on FLINK-19171:
---------------------------------
We can fix it by fetching the pod again before checking pod status. And if it
doesn't exist, we can do the following terminated logic without stopping it.
> K8s Resource Manager may lead to resource leak after pod deleted
> ----------------------------------------------------------------
>
> Key: FLINK-19171
> URL: https://issues.apache.org/jira/browse/FLINK-19171
> Project: Flink
> Issue Type: Bug
> Reporter: Yi Tang
> Priority: Minor
>
> {code:java}
> private void terminatedPodsInMainThread(List<KubernetesPod> pods) {
> getMainThreadExecutor().execute(() -> {
> for (KubernetesPod pod : pods) {
> if (pod.isTerminated()) {
> ...
> }
> }
> });
> }
> {code}
> Looks like that the RM only remove the pod from ledger if the pod
> "isTerminated",
> and the pod has been taken accounted after being created.
> However, it is not complete by checking pod "isTerminated", e.g. a Pending
> pod is deleted manually.
> After that, a new job requires more resource can not trigger the allocation
> of a new pod.
>
> Pls let me know if i misunderstand, thanks.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)