[jira] [Commented] (YUNIKORN-2895) Don't add duplicated allocation to node when the allocation ask fails

Craig Condit (Jira) Thu, 10 Oct 2024 09:32:04 -0700


    [ 
https://issues.apache.org/jira/browse/YUNIKORN-2895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17888355#comment-17888355
 ]


Craig Condit commented on YUNIKORN-2895:
----------------------------------------

{quote}I think the issue is located in the maintenance of the 
{{sortedRequests}} on the application. That list used to be rebuild each cycle 
but now we insert/delete from the slice. During recovery I think we broke 
things. Recovery is using the same path as a node addition so this *could* 
happen on any node add or maybe even on a simple add of an new ask.
{quote}
This is all controlled from the shim by ensuring that nodes are added first, 
and then referenced pods are added afterwards. The regression fixed in 
YUNIKORN-2910 should address this. With that fix, it's not possible for these 
to be handled out-of-order.

Additionally, the {{sortedRequests}} slice is only ever updated in lockstep 
with the requests map (I have verified this in the code).
{quote}{*}First issue{*}: if the old ask _IS_ allocated we will still replace 
that allocation with the new one in the requests map. We skip adjusting the 
pending resources using the already registered ask. This is where it breaks 
down: the requests list should never contain already allocated objects. It 
means we have a reference leak, and thus a memory leak. Long after the 
allocation is removed a reference will be kept in requests that will not get 
removed until we clean up the application. The GC will thus not remove it. For 
long running applications with lots of requests this can become significant.
{quote}
This is false. We never replace the allocation; we check for an existing one 
and update as necessary. The new allocation passed in (from the SI) is only 
ever read from. It is *never* stored in requests (or allocatedRequests) unless 
that allocationKey had never been seen before.

The request list maintains all requests, whether satisfied or not, as does 
sortedRequests. This allows us to check for an allocation in only one place, 
and increases memory only by the size of a pointer for each one (after 1.6.0, 
asks and allocations are no longer distinct objects, and so we are simply 
storing either 2 or 3 references to the same object (2 in the case of 
not-yet-allocated, in requests and sortedRequests; and 3 in the case of 
allocated (requests, sortedRequests, allocatedRequests). There is no memory 
leak. When an allocation goes away (if it goes into a terminal state), it is 
removed {*}from all three places{*}. This doesn't only happen on application 
termination. We have to keep the allocations around for the lifetime of their 
associated pods anyway due to things like mutable pod resources coming in the 
near future – an allocation's size can change after it has been scheduled.
{quote}{*}Second issue{*}: Caused by the replacement also. The new object is 
not marked allocated which causes a big problem as we will try and schedule it. 
We now could have an unallocated and an allocated object with the same key one 
in requests and one in allocations. After we schedule the second one the 
allocations list will be updated and we lose the original info.
{quote}
This is also false. We never *replace* an allocation object if one already 
exists. We examine the state and may update resource requests or transition to 
allocated state based on the deltas between the existing and new objects. This 
is what allows the shim to notify us that an allocation has been satisfied 
outside YuniKorn.
{quote}{*}Third issue{*}: independent of the state we proceed to add the ask to 
the requests. The requests are stored in a map based on the allocation key. 
Which means we are always only tracking a single ask. Never any duplicates. The 
sorted requests however is a sorted slice of references to objects. There is no 
checks in the add into the sorted request slice to replace the existing entry. 
We will happily add a second one to the slice. Two objects same key they are 
both considered when scheduling which means we can easily cause issues there.
{quote}
Also false. We check the state very carefully and only add to requests (and 
sortedRequests) if we *have never seen that allocationKey.* The only place 
sortedRequests is added to is when requests didn't contain that object. The 
checks are not needed in sortedRequests because the pre-checks before updating 
in the requests map already ensure that duplicates don't happen.

 

> Don't add duplicated allocation to node when the allocation ask fails
> ---------------------------------------------------------------------
>
>                 Key: YUNIKORN-2895
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2895
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: core - scheduler
>            Reporter: Qi Zhu
>            Assignee: Qi Zhu
>            Priority: Critical
>
> When i try to revisit the new update allocation logic, the potential 
> duplicated allocation to node will happen if the allocation already 
> allocated.  And we try to add the allocation to the node again and don't 
> revert it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YUNIKORN-2895) Don't add duplicated allocation to node when the allocation ask fails

Reply via email to