[ 
https://issues.apache.org/jira/browse/YUNIKORN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shubham Mishra updated YUNIKORN-3243:
-------------------------------------
    Description: 
When two sibling queues under the same parent have vastly different guaranteed 
resources (e.g., 3600:1 ratio), the fair-share queue sorting in {{TryAllocate}} 
causes the smaller queue to be completely starved — its {{app.tryAllocate()}} 
is never called. This has two consequences:
 # {*}Scheduling starvation{*}: The smaller queue's asks are never evaluated 
for allocation, even when nodes have capacity.

 # {*}Autoscaler blindness{*}: Because {{SetSchedulingAttempted(true)}} is only 
set inside 
{{app.[tryAllocate|https://github.com/apache/yunikorn-core/blob/master/pkg/scheduler/objects/application.go#L1029]()}}
 (line 1035 of {{{}application.go{}}}), the starved queue's asks never get this 
flag. {{inspectOutstandingRequests}} skips them, so the cluster autoscaler 
(e.g., Karpenter) is never notified that capacity is needed.

The second issue is the more critical one — even if scheduling is delayed, the 
autoscaler should be able to provision nodes in parallel. But with the current 
design, the autoscaler signal is gated on queue visitation.
h3.  

  was:
When two sibling queues under the same parent have vastly different guaranteed 
resources (e.g., 3600:1 ratio), the fair-share queue sorting in {{TryAllocate}} 
causes the smaller queue to be completely starved — its {{app.tryAllocate()}} 
is never called. This has two consequences:
 # {*}Scheduling starvation{*}: The smaller queue's asks are never evaluated 
for allocation, even when nodes have capacity.

 # {*}Autoscaler blindness{*}: Because {{SetSchedulingAttempted(true)}} is only 
set inside {{app.tryAllocate()}} (line 1035 of {{{}application.go{}}}), the 
starved queue's asks never get this flag. {{inspectOutstandingRequests}} skips 
them, so the cluster autoscaler (e.g., Karpenter) is never notified that 
capacity is needed.

The second issue is the more critical one — even if scheduling is delayed, the 
autoscaler should be able to provision nodes in parallel. But with the current 
design, the autoscaler signal is gated on queue visitation.
h3.  


> Fair-share queue sorting causes starvation of sibling queues with asymmetric 
> guaranteed resources
> -------------------------------------------------------------------------------------------------
>
>                 Key: YUNIKORN-3243
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-3243
>             Project: Apache YuniKorn
>          Issue Type: Bug
>            Reporter: Shubham Mishra
>            Assignee: Shubham Mishra
>            Priority: Major
>
> When two sibling queues under the same parent have vastly different 
> guaranteed resources (e.g., 3600:1 ratio), the fair-share queue sorting in 
> {{TryAllocate}} causes the smaller queue to be completely starved — its 
> {{app.tryAllocate()}} is never called. This has two consequences:
>  # {*}Scheduling starvation{*}: The smaller queue's asks are never evaluated 
> for allocation, even when nodes have capacity.
>  # {*}Autoscaler blindness{*}: Because {{SetSchedulingAttempted(true)}} is 
> only set inside 
> {{app.[tryAllocate|https://github.com/apache/yunikorn-core/blob/master/pkg/scheduler/objects/application.go#L1029]()}}
>  (line 1035 of {{{}application.go{}}}), the starved queue's asks never get 
> this flag. {{inspectOutstandingRequests}} skips them, so the cluster 
> autoscaler (e.g., Karpenter) is never notified that capacity is needed.
> The second issue is the more critical one — even if scheduling is delayed, 
> the autoscaler should be able to provision nodes in parallel. But with the 
> current design, the autoscaler signal is gated on queue visitation.
> h3.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to