Yongjun Zhang created YUNIKORN-2030:
---------------------------------------

             Summary: Check Headroom checking doesn't prevent failure to 
allocate resource due to max resource limit exceeded
                 Key: YUNIKORN-2030
                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2030
             Project: Apache YuniKorn
          Issue Type: Bug
          Components: core - scheduler
            Reporter: Yongjun Zhang
            Assignee: Yongjun Zhang


As reported in YUNIKORN-1996, we are seeing many messages like below from time 
to time:
{code:java}
 WARN    objects/application.go:1504     queue update failed unexpectedly       
 {“error”: “allocation (map[memory:37580963840 pods:1 vcore:2000]) puts queue 
‘root.test-queue’ over maximum allocation (map[memory:3300011278336 
vcore:390584]), current usage (map[memory:3291983380480 pods:91 
vcore:186000])“}{code}
Restarting Yunikorn helps stoppinging it. Creating this Jira to investigate why 
it happened, because it's not supposed to happen as we check if there is enough 
resource headroom before calling 



 
{code:java}
func (sa *Application) tryNode(node *Node, ask *AllocationAsk) *Allocation 
{code}
which printed the above message, and only call it when there is enough headroom.

There maybe a bug in headroom checking?

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to