Yongjun Zhang created YUNIKORN-2030:
---------------------------------------
Summary: Check Headroom checking doesn't prevent failure to
allocate resource due to max resource limit exceeded
Key: YUNIKORN-2030
URL: https://issues.apache.org/jira/browse/YUNIKORN-2030
Project: Apache YuniKorn
Issue Type: Bug
Components: core - scheduler
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
As reported in YUNIKORN-1996, we are seeing many messages like below from time
to time:
{code:java}
WARN objects/application.go:1504 queue update failed unexpectedly
{“error”: “allocation (map[memory:37580963840 pods:1 vcore:2000]) puts queue
‘root.test-queue’ over maximum allocation (map[memory:3300011278336
vcore:390584]), current usage (map[memory:3291983380480 pods:91
vcore:186000])“}{code}
Restarting Yunikorn helps stoppinging it. Creating this Jira to investigate why
it happened, because it's not supposed to happen as we check if there is enough
resource headroom before calling
{code:java}
func (sa *Application) tryNode(node *Node, ask *AllocationAsk) *Allocation
{code}
which printed the above message, and only call it when there is enough headroom.
There maybe a bug in headroom checking?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]