Baozhu Zhao created FLINK-38252: ----------------------------------- Summary: ResourceManager will not apply for a new pod when pending pod is deleted Key: FLINK-38252 URL: https://issues.apache.org/jira/browse/FLINK-38252 Project: Flink Issue Type: Bug Components: Deployment / Kubernetes Affects Versions: 1.19.3, 1.17.2, 2.1.0 Environment: flink version : 1.17
This problem can be replicated using a small k8s cluster. For example, if the k8s cluster only has a total of 10 core CPUs, Flink job configuration requests four 5-core pods, and actively deletes the pending pods before the job resource request timeout, the ResourceManager will not apply for new pods. Reporter: Baozhu Zhao Our Flink job is deployed on k8s. The SRE of the k8s cluster periodically cleans up pending pods, but Flink does not handle the delete pending pod event, resulting in Flink jobs never applying for new pods and ultimately failing due to insufficient resources. -- This message was sent by Atlassian Jira (v8.20.10#820010)