[
https://issues.apache.org/jira/browse/FLINK-30036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17642330#comment-17642330
]
Peng Yuan commented on FLINK-30036:
-----------------------------------
TM Pod in terminating status:
!https://intranetproxy.alipay.com/skylark/lark/0/2022/png/44456401/1669954379685-387ff2de-2e29-4b43-9ddf-90ca1b185671.png|width=1429,id=uec269415!
The node which pod on it's conditions are:
!https://intranetproxy.alipay.com/skylark/lark/0/2022/png/44456401/1669953309755-6d6d95b9-bdf3-4c0c-bb55-d13a459c8dbc.png?x-oss-process=image%2Fresize%2Cw_1500%2Climit_0!
!https://intranetproxy.alipay.com/skylark/lark/0/2022/png/44456401/1669953318718-81b4db9a-1bc1-46b7-8f94-792819c4d277.png?x-oss-process=image%2Fresize%2Cw_1500%2Climit_0!
We can see when the kubelet can not post node status, the node status is
Unknown.
> Force delete pod when k8s node is not ready
> --------------------------------------------
>
> Key: FLINK-30036
> URL: https://issues.apache.org/jira/browse/FLINK-30036
> Project: Flink
> Issue Type: Improvement
> Components: Deployment / Kubernetes
> Reporter: Peng Yuan
> Priority: Major
> Labels: pull-request-available
> Attachments: image-2022-11-17-10-25-59-945.png
>
>
> When the K8s node is in the NotReady state, the taskmanager pod scheduled on
> it is always in the terminating state. When the flink cluster has a strict
> quota, the terminating pod will hold the resources all the time. As a result,
> the new taskmanager pod cannot apply for resources and cannot be started.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)