[
https://issues.apache.org/jira/browse/FLINK-19171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17193481#comment-17193481
]
Xintong Song commented on FLINK-19171:
--------------------------------------
[~yittg],
Thanks for sharing the details on your case.
Correct me if I'm wrong, it seems to me the real problem in your case is that
when the job is canceled Flink has not removed the pending pods, which it no
longer needs. If the pending pods are properly removed, it won't be necessary
for you to delete the pods manually, and there should be no problem for the
later jobs.
I think it is reasonable for Flink to assume that there won't be another
third-party that communicates with Kubernetes and manipulate with its pods,
unless the third-party is absolutely necessary. That's why I asked for the
reason of manual pod deletions.
FYI, another ticket (FLINK-18229) is tracking the issue of cleaning up pending
workers. Hopefully that solves your problem.
For both issues (FLINK-13554/18229), we are targeting to resolve them in the
1.12 release. Unfortunately, they are both blocked by other issues at the
moment.
> K8s Resource Manager may lead to resource leak after pod deleted
> ----------------------------------------------------------------
>
> Key: FLINK-19171
> URL: https://issues.apache.org/jira/browse/FLINK-19171
> Project: Flink
> Issue Type: Bug
> Reporter: Yi Tang
> Priority: Minor
>
> {code:java}
> private void terminatedPodsInMainThread(List<KubernetesPod> pods) {
> getMainThreadExecutor().execute(() -> {
> for (KubernetesPod pod : pods) {
> if (pod.isTerminated()) {
> ...
> }
> }
> });
> }
> {code}
> Looks like that the RM only remove the pod from ledger if the pod
> "isTerminated",
> and the pod has been taken accounted after being created.
> However, it is not complete by checking pod "isTerminated", e.g. a Pending
> pod is deleted manually.
> After that, a new job requires more resource can not trigger the allocation
> of a new pod.
>
> Pls let me know if i misunderstand, thanks.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)