[
https://issues.apache.org/jira/browse/FLINK-17976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17129344#comment-17129344
]
Yang Wang commented on FLINK-17976:
-----------------------------------
Here we have two questions.
# Running + Pending pods should not be more that the Flink job needs. Once a
running pod is killed/deleted, only one new pod should be started. I will try
to reproduce this situation, if it could happen, then it is a bug and needs to
be fixed.
# The pending pods need to be released after a timeout(e.g. 10 minutes). I am
afraid that currently we do not have a timeout for the pending pods/container
on K8s/Yarn deployment. The pending pods are usually due to no enough
resources. Once it is allocated and launched, it could be released via idle
timeout. Moreover, if the job is running, it is reasonable to keep the pending
pods. The behavior how to release the pending pods/containers could be improved
in the future.
I believe that when the first issue is resolved, your K8s cluster will not beĀ
polluted with many many pending pods. And once the Flink job failed, all the
pending pods could be finally released.
> Test native K8s integration
> ---------------------------
>
> Key: FLINK-17976
> URL: https://issues.apache.org/jira/browse/FLINK-17976
> Project: Flink
> Issue Type: Sub-task
> Components: Runtime / Coordination
> Affects Versions: 1.11.0
> Reporter: Till Rohrmann
> Assignee: Robert Metzger
> Priority: Critical
> Labels: release-testing
> Fix For: 1.11.0
>
> Attachments: enough_tm_wait_5min.txt
>
>
> Test Flink's native K8s integration:
> * session mode
> * application mode
> * custom Flink image
> * custom configuration and log properties
--
This message was sent by Atlassian Jira
(v8.3.4#803005)