[ 
https://issues.apache.org/jira/browse/FLINK-18229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17130446#comment-17130446
 ] 

Till Rohrmann commented on FLINK-18229:
---------------------------------------

I think we should separate the problem into two parts: 

1) SlotManager adjusts its resource needs (e.g. using a returned slot to 
fulfill a pending slot request and, thus, we no longer need another TM, or a 
slot request times out)
2) ResourceManger cannot fulfill the resource needs (e.g. there are not enough 
resources available, starting of a TM hangs like in FLINK-13554)

For 1) I would like to keep this ticket and for 2) we can use FLINK-13554.

In order to solve 1) I think your first proposal sounds good. Whenever the 
{{SlotManager}} realizes that the set of required resources changes, it should 
tell the {{ResourceManager}} about it so that it can adjust the current set of 
pending pod/container requests. 

The question is a bit whether we want to introduce a 
{{ResourceActions#cancelResourceRequest(WorkerResourceSpec)}} or whether we 
want to make {{ResourceActions}} a bit more declarative in the sense that the 
{{SlotManager}} declares the total set of required resources to fulfill all 
pending and fulfilled slot requests. 

For the latter one could introduce 
{{ResourceActions#declareResourceNeeds(Collection<WorkerResourceSpec>)}} which 
could subsume {{ResourceActions#allocateResource}}, 
{{ResourceActions#releaseResource}} and 
{{ResourceActions#cancelResourceRequest(WorkerResourceSpec)}} because the 
{{SlotManager}} would simply tell the {{ResourceManager}} how many resources it 
needs in total and then the {{ResourceManager}} can decide how to fulfill it 
(reconcile the current state with the desired state). In order to not release a 
used resource when one of the resources can be freed, we would have to provide 
additionally the set of still in use resources probably as well.

> Pending worker requests should be properly cleared
> --------------------------------------------------
>
>                 Key: FLINK-18229
>                 URL: https://issues.apache.org/jira/browse/FLINK-18229
>             Project: Flink
>          Issue Type: Improvement
>          Components: Deployment / Kubernetes, Deployment / YARN, Runtime / 
> Coordination
>    Affects Versions: 1.9.3, 1.10.1, 1.11.0
>            Reporter: Xintong Song
>            Priority: Major
>             Fix For: 1.12.0
>
>
> Currently, if Kubernetes/Yarn does not have enough resources to fulfill 
> Flink's resource requirement, there will be pending pod/container requests on 
> Kubernetes/Yarn. These pending resource requirements are never cleared until 
> either fulfilled or the Flink cluster is shutdown.
> However, sometimes Flink no longer needs the pending resources. E.g., the 
> slot request is then fulfilled by another slots that become available, or the 
> job failed due to slot request timeout (in a session cluster). In such cases, 
> Flink does not remove the resource request until the resource is allocated, 
> then it discovers that it no longer needs the allocated resource and release 
> them. This would affect the underlying Kubernetes/Yarn cluster, especially 
> when the cluster is under heavy workload.
> It would be good for Flink to cancel pod/container requests as earlier as 
> possible if it can discover that some of the pending workers are no longer 
> needed.
> There are several approaches potentially achieve this.
>  # We can always check whether there's a pending worker that can be canceled 
> when a \{{PendingTaskManagerSlot}} is unassigned.
>  # We can have a separate timeout for requesting new worker. If the resource 
> cannot be allocated within the given time since requested, we should cancel 
> that resource request and claim a resource allocation failure.
>  # We can share the same timeout for starting new worker (proposed in 
> FLINK-13554). This is similar to 2), but it requires the worker to be 
> registered, rather than allocated, before timeout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to