[
https://issues.apache.org/jira/browse/FLINK-17976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17129249#comment-17129249
]
Robert Metzger commented on FLINK-17976:
----------------------------------------
Thanks a lot regarding your comments.
> If you mean the jobmanager web dashboard are public accessible when service
> exposed type is LoadBalancer, then it is true. Just as you say, maybe we need
> to add a warning for users. However, providing a separate LB for each Flink
> cluster is too expensive, users usually use customized ingress.
I will add a warning to the docs
> Currently, if the TaskManager is launched successfully, it could be released
> after idle timeout. However, it seems that your cluster does not have enough
> resource, then all the pods are pending. It is an expected behavior just like
> YARN.
> If you kill/delete an active pod, it will be terminated and a new one will be
> allocated. So the pending pods increase. Once it is launched and register to
> Flink ResourceManager, the pending pods will decrease.
The scenario was the following:
Running TaskManagers: 4
Pending/Requested TaskMangers: 30
.. then I killed one of the running TaskManagers ...
Running TaskManagers: 4
Pending/Requested TaskMangers: 56
Expected behavior: The number of "Pending/Requested TaskMangers" at least does
not increase. Ideally the {{KubernetesResourceManager}} cancels "Pending TMs"
after a timeout of say 10 minutes.
Actual behavior: The number of Pending TaskManagers goes up even though there
are plenty of unfulfilled requests pending.
Why is this bad? My Kubernetes cluster was basically polluted / spammed with
pending pods.
> Test native K8s integration
> ---------------------------
>
> Key: FLINK-17976
> URL: https://issues.apache.org/jira/browse/FLINK-17976
> Project: Flink
> Issue Type: Sub-task
> Components: Runtime / Coordination
> Affects Versions: 1.11.0
> Reporter: Till Rohrmann
> Assignee: Robert Metzger
> Priority: Critical
> Labels: release-testing
> Fix For: 1.11.0
>
> Attachments: enough_tm_wait_5min.txt
>
>
> Test Flink's native K8s integration:
> * session mode
> * application mode
> * custom Flink image
> * custom configuration and log properties
--
This message was sent by Atlassian Jira
(v8.3.4#803005)