[ 
https://issues.apache.org/jira/browse/YUNIKORN-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494364#comment-17494364
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-462:
------------------------------------------------

The SI change has been committed to make sure we track what we are going to do 
based on the plan described in the PR. The next step is modifying the core code 
followed by the k8shim

We have one change in mind remove the ReSyncSchedulerCache call. The call is 
made twice from the core to the shim:
 * for new allocations
 * for removed allocations

First new allocations: the message we send is a RMNewAllocationsEvent with all 
allocations that are new. That event is currently async. That is why we first 
we call a sync cache update with partial info. The change is that the event 
will become a sync call. We have enough information in the current Allocations 
array. This is part of the event we send and we can also call the assume of the 
pod inside the cache on the shim side by pulling that key from the allocation. 
We always call assume pod for every new allocation.

Simple change on the SI side: we can remove the AssumedAllocation message. 
Leverage existing information for the AllocationKey

On the remove side we have 4 locations where we send a 
RMReleaseAllocationEvent. Only in one location, as you pointed out, we also 
call the sync of the cache. The sync of the cache triggers the forget of an 
assumed pod. Only looking at the path that sends events back. The cache sync is 
part of one of these calls.
 # handleRMUpdateApplicationEvent handles removal of an application. Does not 
call the cache sync.
 # updateNode handles the node removal. Does not call the cache sync.
 # schedule triggers the release of a placeholder. Does not call the cache sync.
 # processAllocationReleases is processing release requests send by the shim. 
This calls the cache sync.

The termination type for call 1,2 and 4 is STOPPED_BY_RM. For call 2 it is 
PLACEHOLDER_REPLACED.

Every single allocation is assumed as per above description. So we should also 
forget a pod, remove it, from the assumed pod list when we remove the pod 
without exception. If we do not we could leak the entry in the assumed pod 
cache structure. There should be no difference in the communication for any of 
these cases between core and shim.
Simple change on the SI side: we can remove the ForgotAllocation message. Add 
the allocationKey to the AllocationRelease message (check case for the new 
field!)

The core sends the events synchronously. The shim collapses the assume call 
into the event processing for new allocations, returns as soon as possible 
forking of the long running tasks. The shim collapses the forget call into the 
event processing for the remove. If there is any special cases to not forget 
the assumption of a removed pod then the shim must implement it. The core 
should not be the one that decides this.

After this we also need to completely remove the ReSyncSchedulerCacheArgs and 
the ReSyncSchedulerCache call from the interface.

> Streamline core to shim update on allocation change
> ---------------------------------------------------
>
>                 Key: YUNIKORN-462
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-462
>             Project: Apache YuniKorn
>          Issue Type: Sub-task
>          Components: core - scheduler, shim - kubernetes
>            Reporter: Wilfred Spiegelenburg
>            Assignee: Manikandan R
>            Priority: Major
>              Labels: pull-request-available
>
> Currently in the scheduler we have two updates that get send to the shim when 
> an allocation is added or released:
> * event to shim RM event handler to allocate
> * reconciler plugin to update the shim caches
> Before YUNIKORN-317 one update was made in the cace the other in the 
> scheduler. Now they are both in the scheduler in quick succession. The cache 
> update in the shim is needed to make sure that the predicates are seeing the 
> correct info. The event does the real bind etc of the allocation on the node.
> We should be able to fold the two calls into one call. However this requires 
> changes on both sides and might even impact the SI as it will likely become a 
> synced event call.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to