[
https://issues.apache.org/jira/browse/FLINK-18229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17352093#comment-17352093
]
Nicolas Ferrario commented on FLINK-18229:
------------------------------------------
Hey [~xintongsong], what's the status of this ticket? We just had this problem
running on Native Kubernetes. So we were using all resources available in a K8s
cluster, and killed a pod (TM) intentionally. Flink somehow requested 6 more
TMs, and only one of them succeed, since there were no more resources available
for the rest. What's interesting is that the old TMs never stopped, so they got
reused after the job recovered itself from last checkpoint, but we were left
with 5 TMs in Pending state that will never be gone unless we free some slots.
> Pending worker requests should be properly cleared
> --------------------------------------------------
>
> Key: FLINK-18229
> URL: https://issues.apache.org/jira/browse/FLINK-18229
> Project: Flink
> Issue Type: Sub-task
> Components: Deployment / Kubernetes, Deployment / YARN, Runtime /
> Coordination
> Affects Versions: 1.9.3, 1.10.1, 1.11.0
> Reporter: Xintong Song
> Priority: Major
> Fix For: 1.14.0
>
>
> Currently, if Kubernetes/Yarn does not have enough resources to fulfill
> Flink's resource requirement, there will be pending pod/container requests on
> Kubernetes/Yarn. These pending resource requirements are never cleared until
> either fulfilled or the Flink cluster is shutdown.
> However, sometimes Flink no longer needs the pending resources. E.g., the
> slot request is then fulfilled by another slots that become available, or the
> job failed due to slot request timeout (in a session cluster). In such cases,
> Flink does not remove the resource request until the resource is allocated,
> then it discovers that it no longer needs the allocated resource and release
> them. This would affect the underlying Kubernetes/Yarn cluster, especially
> when the cluster is under heavy workload.
> It would be good for Flink to cancel pod/container requests as earlier as
> possible if it can discover that some of the pending workers are no longer
> needed.
> There are several approaches potentially achieve this.
> # We can always check whether there's a pending worker that can be canceled
> when a \{{PendingTaskManagerSlot}} is unassigned.
> # We can have a separate timeout for requesting new worker. If the resource
> cannot be allocated within the given time since requested, we should cancel
> that resource request and claim a resource allocation failure.
> # We can share the same timeout for starting new worker (proposed in
> FLINK-13554). This is similar to 2), but it requires the worker to be
> registered, rather than allocated, before timeout.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)