[
https://issues.apache.org/jira/browse/YUNIKORN-2895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886295#comment-17886295
]
Wilfred Spiegelenburg commented on YUNIKORN-2895:
-------------------------------------------------
There is something more broken. We should never see an ask at that point in the
scheduling cycle that has been allocated. The application was write locked, and
remains locked, a number of function calls before we get here. We filter all
allocations and only proceed with the allocations that have not been allocated.
We do that inside the lock.
The allocation should only be manipulated through the application which means
the allocated flag cannot be changed while scheduling is in progress.
I think the issue is located in the maintenance of the {{sortedRequests}} on
the application. That list used to be rebuild each cycle but now we
insert/delete from the slice. During recovery I think we broke things. Recovery
is using the same path as a node addition so this *could* happen on any node
add or maybe even on a simple add of an new ask.
When we call {{application.AddAllocationAsk}} we check that the object is not
allocated. That fact is always true as we create a new ask object from the SI.
So we skip to the next step. This triggers a check for an already known
outstanding ask. If that outstanding ask is not allocated we replace the object
with the new one. We also make sure that we update resources if those have
changed on the queues and app (pending).
{*}First issue{*}: if the old ask _IS_ allocated we will still replace that
allocation with the new one in the requests map. We skip adjusting the pending
resources using the already registered ask. This is where it breaks down: the
requests list should never contain already allocated objects. It means we have
a reference leak, and thus a memory leak. Long after the allocation is removed
a reference will be kept in requests that will not get removed until we clean
up the application. The GC will thus not remove it. For long running
applications with lots of requests this can become significant.
{*}Second issue{*}: Caused by the replacement also. The new object is not
marked allocated which causes a big problem as we will try and schedule it. We
now could have an unallocated and an allocated object with the same key one in
requests and one in allocations. After we schedule the second one the
allocations list will be updated and we lose the original info.
{*}Third issue{*}: independent of the state we proceed to add the ask to the
requests. The requests are stored in a map based on the allocation key. Which
means we are always only tracking a single ask. Never any duplicates. The
sorted requests however is a sorted slice of references to objects. There is no
checks in the add into the sorted request slice to replace the existing entry.
We will happily add a second one to the slice. Two objects same key they are
both considered when scheduling which means we can easily cause issues there.
This code of adding allocations and asks needs a proper review. Over time with
multiple changes on top of each other we have introduced issues here.
> Don't add duplicated allocation to node when the allocation ask fails
> ---------------------------------------------------------------------
>
> Key: YUNIKORN-2895
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2895
> Project: Apache YuniKorn
> Issue Type: Bug
> Components: core - scheduler
> Reporter: Qi Zhu
> Assignee: Qi Zhu
> Priority: Critical
>
> When i try to revisit the new update allocation logic, the potential
> duplicated allocation to node will happen if the allocation already
> allocated. And we try to add the allocation to the node again and don't
> revert it.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]