[ 
https://issues.apache.org/jira/browse/FLINK-38252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18016203#comment-18016203
 ] 

Baozhu Zhao commented on FLINK-38252:
-------------------------------------

[~xtsong] hi, Could you assign this issue to me? 
 

> ResourceManager will not apply for a new pod when pending pod is deleted
> ------------------------------------------------------------------------
>
>                 Key: FLINK-38252
>                 URL: https://issues.apache.org/jira/browse/FLINK-38252
>             Project: Flink
>          Issue Type: Bug
>          Components: Deployment / Kubernetes
>    Affects Versions: 1.17.2, 1.19.3
>            Reporter: Baozhu Zhao
>            Priority: Minor
>
> Our Flink job is deployed on k8s.
>  
> The SRE of the k8s cluster periodically cleans up pending pods, but Flink 
> does not handle the delete pending pod event, resulting in Flink jobs never 
> applying for new pods and ultimately failing due to insufficient resources.
>  
> This problem can be replicated using a small k8s cluster.
> For example, if the k8s cluster only has a total of 10 core CPUs, Flink job 
> configuration requests four 5-core pods, and actively deletes the pending 
> pods before the job resource request timeout, the ResourceManager will not 
> apply for new pods.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to