Peter Bacsko created YUNIKORN-2370:
--------------------------------------

             Summary: Handle events when headroom checks fail on a per-request 
basis
                 Key: YUNIKORN-2370
                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2370
             Project: Apache YuniKorn
          Issue Type: Sub-task
          Components: core - scheduler
            Reporter: Peter Bacsko


Currently, we have this code inside tryAllocate:

{noformat}
unc (sa *Application) tryAllocate(headRoom *resources.Resource, allowPreemption 
bool, preemptionDelay time.Duration, preemptAttemptsRemaining *int, 
nodeIterator func() NodeIterator, fullNodeIterator func() NodeIterator, 
getNodeFn func(string) *Node) *Allocation {
        sa.Lock()
        defer sa.Unlock()
        if sa.sortedRequests == nil {
                return nil
        }
        // calculate the users' headroom, includes group check which requires 
the applicationID
        userHeadroom := ugm.GetUserManager().Headroom(sa.queuePath, 
sa.ApplicationID, sa.user)
        // get all the requests from the app sorted in order
        for _, request := range sa.sortedRequests {
                if request.GetPendingAskRepeat() == 0 {
                        continue
                }
                // check if there is a replacement possible
                if sa.canReplace(request) {
                        continue
                }
                // check if this fits in the users' headroom first, if that 
fits check the queues' headroom
                // NOTE: preemption most likely will not help in this case. The 
chance that preemption helps is mall
                // as the preempted allocation must be for the same user in a 
different queue in the hierarchy...
                if !userHeadroom.FitInMaxUndef(request.GetAllocatedResource()) {
                        continue
                }

                // resource must fit in headroom otherwise skip the request 
(unless preemption could help)
                if !headRoom.FitInMaxUndef(request.GetAllocatedResource()) {
                        // attempt preemption
                        if allowPreemption && *preemptAttemptsRemaining > 0 {
                                *preemptAttemptsRemaining--
                                fullIterator := fullNodeIterator()
                                if fullIterator != nil {
                                        if alloc, ok := 
sa.tryPreemption(headRoom, preemptionDelay, request, fullIterator, false); ok {
                                                // preemption occurred, and 
possibly reservation
                                                return alloc
                                        }
                                }
                        }
                        sa.appEvents.sendAppDoesNotFitEvent(request, headRoom)  
 <--- event
                        continue
                }
{noformat}

There are issues with this approach:
1. We say "the application doesn't fit" while it's really the request that 
doesn't fit.
2. If there's no quota at all, then a request gets its own event, but the rest 
doesn't.

Suggested approach:
1. Have a per-request event
2. When an event is sent (eg. failed user headroom) for a given request, 
remember it and don't send it anymore



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to