[
https://issues.apache.org/jira/browse/FLINK-30036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17635064#comment-17635064
]
Yang Wang commented on FLINK-30036:
-----------------------------------
After more investigation, it seems that the terminating pods are counted into
the used quota. Then I think this ticket is a valid issue. We may need a config
option to enable force-delete when the pod might block at terminating(e.g. node
not ready).
I have one more concern that node not ready does not always mean the pod will
block at terminating status. Force delete will send a SIGKILL to pod and the TM
will not have the chance for the clean-up.
> Force delete pod when k8s node is not ready
> --------------------------------------------
>
> Key: FLINK-30036
> URL: https://issues.apache.org/jira/browse/FLINK-30036
> Project: Flink
> Issue Type: Improvement
> Components: Deployment / Kubernetes
> Reporter: Peng Yuan
> Priority: Major
> Labels: pull-request-available
>
> When the K8s node is in the NotReady state, the taskmanager pod scheduled on
> it is always in the terminating state. When the flink cluster has a strict
> quota, the terminating pod will hold the resources all the time. As a result,
> the new taskmanager pod cannot apply for resources and cannot be started.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)