Baozhu Zhao created FLINK-38252:
-----------------------------------
Summary: ResourceManager will not apply for a new pod when pending
pod is deleted
Key: FLINK-38252
URL: https://issues.apache.org/jira/browse/FLINK-38252
Project: Flink
Issue Type: Bug
Components: Deployment / Kubernetes
Affects Versions: 1.19.3, 1.17.2, 2.1.0
Environment: flink version : 1.17
This problem can be replicated using a small k8s cluster.
For example, if the k8s cluster only has a total of 10 core CPUs, Flink job
configuration requests four 5-core pods, and actively deletes the pending pods
before the job resource request timeout, the ResourceManager will not apply for
new pods.
Reporter: Baozhu Zhao
Our Flink job is deployed on k8s.
The SRE of the k8s cluster periodically cleans up pending pods, but Flink does
not handle the delete pending pod event, resulting in Flink jobs never applying
for new pods and ultimately failing due to insufficient resources.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)