[ 
https://issues.apache.org/jira/browse/YUNIKORN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18067798#comment-18067798
 ] 

Shubham Mishra edited comment on YUNIKORN-3243 at 3/23/26 6:47 PM:
-------------------------------------------------------------------

Just to clarify, 

The bug is not in _how_ sorting works at each level. It's that:                 
                          
1. The root-level sort correctly computes DRF for sys-default vs uip            
                             

2. The greedy first-success return
{code:java}
(if result != nil { return result }) {code}
means *only the first child that succeeds gets any work done per cycle.*

3. With a 3,600:1 guaranteed ratio and any non-zero prior allocation on the 
small queue, the small queue sorts last for thousands of cycles.

Correct per-level DRF + greedy first-success + extreme ratio = correct 
algorithm, but wrong outcome. The scheduler is working exactly as designed; 
{_}the design breaks down at ratios this extreme{_}.


was (Author: JIRAUSER307897):
Just to clarify, 

The bug is not in _how_ sorting works at each level. It's that:                 
                          
1. The root-level sort correctly computes DRF for sys-default vs uip            
                                  2. The greedy first-success return
{code:java}
(if result != nil { return result }) {code}
means *only the first child that succeeds gets any work done per cycle.*
3. With a 3,600:1 guaranteed ratio and any non-zero prior allocation on the 
small queue, the small queue sorts last for thousands of cycles.

Correct per-level DRF + greedy first-success + extreme ratio = correct 
algorithm, but wrong outcome. The scheduler is working exactly as designed; 
{_}the design breaks down at ratios this extreme{_}.

> Fair-share queue sorting causes starvation of sibling queues with asymmetric 
> guaranteed resources
> -------------------------------------------------------------------------------------------------
>
>                 Key: YUNIKORN-3243
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-3243
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: core - scheduler
>            Reporter: Shubham Mishra
>            Assignee: Shubham Mishra
>            Priority: Major
>             Fix For: 1.6.1
>
>
> When two sibling queues under the same parent have vastly different 
> guaranteed resources (e.g., 3600:1 ratio), the fair-share queue sorting in 
> {{TryAllocate}} causes the smaller queue to be completely starved — its 
> {{app.tryAllocate()}} is never called. This has two consequences:
> {*}1. Scheduling starvation{*}: The smaller queue's asks are never evaluated 
> for allocation, even when nodes have capacity.
> {*}2. Autoscaler blindness{*}: Because 
> {{[SetSchedulingAttempted|https://github.com/apache/yunikorn-core/blob/cb7f2381b6098f8936fe57dd7f13f205939a0021/pkg/scheduler/objects/application.go#L1065](true)}}
>  is only set inside {{app.tryAllocate()}} (line 1065 of 
> {{{}application.go{}}}), the starved queue's asks never get this flag. 
> {{inspectOutstandingRequests}} skips them, so the cluster autoscaler (e.g., 
> Karpenter) is never notified that capacity is needed.
> The second issue is the more critical one — even if scheduling is delayed, 
> the autoscaler should be able to provision nodes in parallel. But with the 
> current design, the autoscaler signal is gated on queue visitation.
> Here are the unit-tests to reproduce this - 
> [https://github.com/apache/yunikorn-core/pull/1077]
> h3.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to