[ https://issues.apache.org/jira/browse/FLINK-38252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18016203#comment-18016203 ]
Baozhu Zhao commented on FLINK-38252: ------------------------------------- [~xtsong] hi, Could you assign this issue to me? > ResourceManager will not apply for a new pod when pending pod is deleted > ------------------------------------------------------------------------ > > Key: FLINK-38252 > URL: https://issues.apache.org/jira/browse/FLINK-38252 > Project: Flink > Issue Type: Bug > Components: Deployment / Kubernetes > Affects Versions: 1.17.2, 1.19.3 > Reporter: Baozhu Zhao > Priority: Minor > > Our Flink job is deployed on k8s. > > The SRE of the k8s cluster periodically cleans up pending pods, but Flink > does not handle the delete pending pod event, resulting in Flink jobs never > applying for new pods and ultimately failing due to insufficient resources. > > This problem can be replicated using a small k8s cluster. > For example, if the k8s cluster only has a total of 10 core CPUs, Flink job > configuration requests four 5-core pods, and actively deletes the pending > pods before the job resource request timeout, the ResourceManager will not > apply for new pods. -- This message was sent by Atlassian Jira (v8.20.10#820010)