[ 
https://issues.apache.org/jira/browse/FLINK-17976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17129344#comment-17129344
 ] 

Yang Wang commented on FLINK-17976:
-----------------------------------

Here we have two questions.
 # Running + Pending pods should not be more that the Flink job needs. Once a 
running pod is killed/deleted, only one new pod should be started. I will try 
to reproduce this situation, if it could happen, then it is a bug and needs to 
be fixed.
 # The pending pods need to be released after a timeout(e.g. 10 minutes). I am 
afraid that currently we do not have a timeout for the pending pods/container 
on K8s/Yarn deployment. The pending pods are usually due to no enough 
resources. Once it is allocated and launched, it could be released via idle 
timeout. Moreover, if the job is running, it is reasonable to keep the pending 
pods. The behavior how to release the pending pods/containers could be improved 
in the future.

I believe that when the first issue is resolved, your K8s cluster will not beĀ 
polluted with many many pending pods. And once the Flink job failed, all the 
pending pods could be finally released.

> Test native K8s integration
> ---------------------------
>
>                 Key: FLINK-17976
>                 URL: https://issues.apache.org/jira/browse/FLINK-17976
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Coordination
>    Affects Versions: 1.11.0
>            Reporter: Till Rohrmann
>            Assignee: Robert Metzger
>            Priority: Critical
>              Labels: release-testing
>             Fix For: 1.11.0
>
>         Attachments: enough_tm_wait_5min.txt
>
>
> Test Flink's native K8s integration:
> * session mode
> * application mode
> * custom Flink image
> * custom configuration and log properties



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to