[
https://issues.apache.org/jira/browse/YUNIKORN-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17774755#comment-17774755
]
Wilfred Spiegelenburg commented on YUNIKORN-2030:
-------------------------------------------------
There will never be more than one allocation in progress at the same time.
Allocation processing is by nature single threaded. There are a number of
points in the allocation process that make running them in parallel difficult.
We currently do not need it either, performance is more than good enough with a
single go routine.
The message around the as reported in the details is most likely the result of
the race condition that was fixed in YUNIKORN-1993. The race condition causes
the allocated resources of the queue(s) to not be updated correctly. When you
are in a state like that it will not resolve itself until you restart.
> Check Headroom checking doesn't prevent failure to allocate resource due to
> max resource limit exceeded
> -------------------------------------------------------------------------------------------------------
>
> Key: YUNIKORN-2030
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2030
> Project: Apache YuniKorn
> Issue Type: Bug
> Components: core - scheduler
> Reporter: Yongjun Zhang
> Assignee: Yongjun Zhang
> Priority: Major
>
> As reported in YUNIKORN-1996, we are seeing many messages like below from
> time to time:
> {code:java}
> WARN objects/application.go:1504 queue update failed unexpectedly
> {“error”: “allocation (map[memory:37580963840 pods:1 vcore:2000]) puts
> queue ‘root.test-queue’ over maximum allocation (map[memory:3300011278336
> vcore:390584]), current usage (map[memory:3291983380480 pods:91
> vcore:186000])“}{code}
> Restarting Yunikorn helps stoppinging it. Creating this Jira to investigate
> why it happened, because it's not supposed to happen as we check if there is
> enough resource headroom before calling
>
> {code:java}
> func (sa *Application) tryNode(node *Node, ask *AllocationAsk) *Allocation
> {code}
> which printed the above message, and only call it when there is enough
> headroom.
> There maybe a bug in headroom checking?
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]