[
https://issues.apache.org/jira/browse/YUNIKORN-2895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17888355#comment-17888355
]
Craig Condit commented on YUNIKORN-2895:
----------------------------------------
{quote}I think the issue is located in the maintenance of the
{{sortedRequests}} on the application. That list used to be rebuild each cycle
but now we insert/delete from the slice. During recovery I think we broke
things. Recovery is using the same path as a node addition so this *could*
happen on any node add or maybe even on a simple add of an new ask.
{quote}
This is all controlled from the shim by ensuring that nodes are added first,
and then referenced pods are added afterwards. The regression fixed in
YUNIKORN-2910 should address this. With that fix, it's not possible for these
to be handled out-of-order.
Additionally, the {{sortedRequests}} slice is only ever updated in lockstep
with the requests map (I have verified this in the code).
{quote}{*}First issue{*}: if the old ask _IS_ allocated we will still replace
that allocation with the new one in the requests map. We skip adjusting the
pending resources using the already registered ask. This is where it breaks
down: the requests list should never contain already allocated objects. It
means we have a reference leak, and thus a memory leak. Long after the
allocation is removed a reference will be kept in requests that will not get
removed until we clean up the application. The GC will thus not remove it. For
long running applications with lots of requests this can become significant.
{quote}
This is false. We never replace the allocation; we check for an existing one
and update as necessary. The new allocation passed in (from the SI) is only
ever read from. It is *never* stored in requests (or allocatedRequests) unless
that allocationKey had never been seen before.
The request list maintains all requests, whether satisfied or not, as does
sortedRequests. This allows us to check for an allocation in only one place,
and increases memory only by the size of a pointer for each one (after 1.6.0,
asks and allocations are no longer distinct objects, and so we are simply
storing either 2 or 3 references to the same object (2 in the case of
not-yet-allocated, in requests and sortedRequests; and 3 in the case of
allocated (requests, sortedRequests, allocatedRequests). There is no memory
leak. When an allocation goes away (if it goes into a terminal state), it is
removed {*}from all three places{*}. This doesn't only happen on application
termination. We have to keep the allocations around for the lifetime of their
associated pods anyway due to things like mutable pod resources coming in the
near future – an allocation's size can change after it has been scheduled.
{quote}{*}Second issue{*}: Caused by the replacement also. The new object is
not marked allocated which causes a big problem as we will try and schedule it.
We now could have an unallocated and an allocated object with the same key one
in requests and one in allocations. After we schedule the second one the
allocations list will be updated and we lose the original info.
{quote}
This is also false. We never *replace* an allocation object if one already
exists. We examine the state and may update resource requests or transition to
allocated state based on the deltas between the existing and new objects. This
is what allows the shim to notify us that an allocation has been satisfied
outside YuniKorn.
{quote}{*}Third issue{*}: independent of the state we proceed to add the ask to
the requests. The requests are stored in a map based on the allocation key.
Which means we are always only tracking a single ask. Never any duplicates. The
sorted requests however is a sorted slice of references to objects. There is no
checks in the add into the sorted request slice to replace the existing entry.
We will happily add a second one to the slice. Two objects same key they are
both considered when scheduling which means we can easily cause issues there.
{quote}
Also false. We check the state very carefully and only add to requests (and
sortedRequests) if we *have never seen that allocationKey.* The only place
sortedRequests is added to is when requests didn't contain that object. The
checks are not needed in sortedRequests because the pre-checks before updating
in the requests map already ensure that duplicates don't happen.
> Don't add duplicated allocation to node when the allocation ask fails
> ---------------------------------------------------------------------
>
> Key: YUNIKORN-2895
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2895
> Project: Apache YuniKorn
> Issue Type: Bug
> Components: core - scheduler
> Reporter: Qi Zhu
> Assignee: Qi Zhu
> Priority: Critical
>
> When i try to revisit the new update allocation logic, the potential
> duplicated allocation to node will happen if the allocation already
> allocated. And we try to add the allocation to the node again and don't
> revert it.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]