Baozhu Zhao created FLINK-38252:
-----------------------------------

             Summary: ResourceManager will not apply for a new pod when pending 
pod is deleted
                 Key: FLINK-38252
                 URL: https://issues.apache.org/jira/browse/FLINK-38252
             Project: Flink
          Issue Type: Bug
          Components: Deployment / Kubernetes
    Affects Versions: 1.19.3, 1.17.2, 2.1.0
         Environment: flink version : 1.17

This problem can be replicated using a small k8s cluster.
For example, if the k8s cluster only has a total of 10 core CPUs, Flink job 
configuration requests four 5-core pods, and actively deletes the pending pods 
before the job resource request timeout, the ResourceManager will not apply for 
new pods.
            Reporter: Baozhu Zhao


Our Flink job is deployed on k8s.

The SRE of the k8s cluster periodically cleans up pending pods, but Flink does 
not handle the delete pending pod event, resulting in Flink jobs never applying 
for new pods and ultimately failing due to insufficient resources.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to