Peter Bacsko created YUNIKORN-2370:
--------------------------------------
Summary: Handle events when headroom checks fail on a per-request
basis
Key: YUNIKORN-2370
URL: https://issues.apache.org/jira/browse/YUNIKORN-2370
Project: Apache YuniKorn
Issue Type: Sub-task
Components: core - scheduler
Reporter: Peter Bacsko
Currently, we have this code inside tryAllocate:
{noformat}
unc (sa *Application) tryAllocate(headRoom *resources.Resource, allowPreemption
bool, preemptionDelay time.Duration, preemptAttemptsRemaining *int,
nodeIterator func() NodeIterator, fullNodeIterator func() NodeIterator,
getNodeFn func(string) *Node) *Allocation {
sa.Lock()
defer sa.Unlock()
if sa.sortedRequests == nil {
return nil
}
// calculate the users' headroom, includes group check which requires
the applicationID
userHeadroom := ugm.GetUserManager().Headroom(sa.queuePath,
sa.ApplicationID, sa.user)
// get all the requests from the app sorted in order
for _, request := range sa.sortedRequests {
if request.GetPendingAskRepeat() == 0 {
continue
}
// check if there is a replacement possible
if sa.canReplace(request) {
continue
}
// check if this fits in the users' headroom first, if that
fits check the queues' headroom
// NOTE: preemption most likely will not help in this case. The
chance that preemption helps is mall
// as the preempted allocation must be for the same user in a
different queue in the hierarchy...
if !userHeadroom.FitInMaxUndef(request.GetAllocatedResource()) {
continue
}
// resource must fit in headroom otherwise skip the request
(unless preemption could help)
if !headRoom.FitInMaxUndef(request.GetAllocatedResource()) {
// attempt preemption
if allowPreemption && *preemptAttemptsRemaining > 0 {
*preemptAttemptsRemaining--
fullIterator := fullNodeIterator()
if fullIterator != nil {
if alloc, ok :=
sa.tryPreemption(headRoom, preemptionDelay, request, fullIterator, false); ok {
// preemption occurred, and
possibly reservation
return alloc
}
}
}
sa.appEvents.sendAppDoesNotFitEvent(request, headRoom)
<--- event
continue
}
{noformat}
There are issues with this approach:
1. We say "the application doesn't fit" while it's really the request that
doesn't fit.
2. If there's no quota at all, then a request gets its own event, but the rest
doesn't.
Suggested approach:
1. Have a per-request event
2. When an event is sent (eg. failed user headroom) for a given request,
remember it and don't send it anymore
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]