[ 
https://issues.apache.org/jira/browse/YUNIKORN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shubham Mishra updated YUNIKORN-3243:
-------------------------------------
    Description: 
When two sibling queues under the same parent have vastly different guaranteed 
resources (e.g., 3600:1 ratio), the fair-share queue sorting in {{TryAllocate}} 
causes the smaller queue to be completely starved — its {{app.tryAllocate()}} 
is never called. This has two consequences:
 # {*}Scheduling starvation{*}: The smaller queue's asks are never evaluated 
for allocation, even when nodes have capacity.

 # {*}Autoscaler blindness{*}: Because 
{{[SetSchedulingAttempted|[http://example.com|https://github.com/apache/yunikorn-core/blob/cb7f2381b6098f8936fe57dd7f13f205939a0021/pkg/scheduler/objects/application.go#L1065]](true)}}
 is only set inside {{app.tryAllocate()}} (line 1065 of 
{{{}application.go{}}}), the starved queue's asks never get this flag. 
{{inspectOutstandingRequests}} skips them, so the cluster autoscaler (e.g., 
Karpenter) is never notified that capacity is needed.

The second issue is the more critical one — even if scheduling is delayed, the 
autoscaler should be able to provision nodes in parallel. But with the current 
design, the autoscaler signal is gated on queue visitation.

Here are the unit-tests to reproduce this - 
[https://github.com/apache/yunikorn-core/pull/1077]
h3.  

  was:
When two sibling queues under the same parent have vastly different guaranteed 
resources (e.g., 3600:1 ratio), the fair-share queue sorting in {{TryAllocate}} 
causes the smaller queue to be completely starved — its {{app.tryAllocate()}} 
is never called. This has two consequences:
 # {*}Scheduling starvation{*}: The smaller queue's asks are never evaluated 
for allocation, even when nodes have capacity.

 # {*}Autoscaler blindness{*}: Because {{SetSchedulingAttempted(true)}} is only 
set inside {{app.tryAllocate()}} (line 1035 of {{{}application.go{}}}), the 
starved queue's asks never get this flag. {{inspectOutstandingRequests}} skips 
them, so the cluster autoscaler (e.g., Karpenter) is never notified that 
capacity is needed.

The second issue is the more critical one — even if scheduling is delayed, the 
autoscaler should be able to provision nodes in parallel. But with the current 
design, the autoscaler signal is gated on queue visitation.

Here are the unit-tests to reproduce this - 
https://issues.apache.org/jira/browse/YUNIKORN-3243
h3.  


> Fair-share queue sorting causes starvation of sibling queues with asymmetric 
> guaranteed resources
> -------------------------------------------------------------------------------------------------
>
>                 Key: YUNIKORN-3243
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-3243
>             Project: Apache YuniKorn
>          Issue Type: Bug
>            Reporter: Shubham Mishra
>            Assignee: Shubham Mishra
>            Priority: Major
>
> When two sibling queues under the same parent have vastly different 
> guaranteed resources (e.g., 3600:1 ratio), the fair-share queue sorting in 
> {{TryAllocate}} causes the smaller queue to be completely starved — its 
> {{app.tryAllocate()}} is never called. This has two consequences:
>  # {*}Scheduling starvation{*}: The smaller queue's asks are never evaluated 
> for allocation, even when nodes have capacity.
>  # {*}Autoscaler blindness{*}: Because 
> {{[SetSchedulingAttempted|[http://example.com|https://github.com/apache/yunikorn-core/blob/cb7f2381b6098f8936fe57dd7f13f205939a0021/pkg/scheduler/objects/application.go#L1065]](true)}}
>  is only set inside {{app.tryAllocate()}} (line 1065 of 
> {{{}application.go{}}}), the starved queue's asks never get this flag. 
> {{inspectOutstandingRequests}} skips them, so the cluster autoscaler (e.g., 
> Karpenter) is never notified that capacity is needed.
> The second issue is the more critical one — even if scheduling is delayed, 
> the autoscaler should be able to provision nodes in parallel. But with the 
> current design, the autoscaler signal is gated on queue visitation.
> Here are the unit-tests to reproduce this - 
> [https://github.com/apache/yunikorn-core/pull/1077]
> h3.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to