[ 
https://issues.apache.org/jira/browse/YUNIKORN-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17278280#comment-17278280
 ] 

Weiwei Yang commented on YUNIKORN-462:
--------------------------------------

hi [~wilfreds], [[email protected]] thanks for looking into this.

{quote}
AssumePod call in scheduler_callback#ReSyncSchedulerCache() can be moved to a 
corresponding NewAllocations "for" loop block in 
scheduler_callback#RecvUpdateResponse. 
{quote}

This may not work. The reason is the core side runs scheduling cycles in a 
loop, and send allocations to the shim in async mode. That means the core could 
run a few allocations already but the actual allocate has not yet been sent to 
the shim to execute. Each time the core tries to allocate a pod, the core needs 
to run predicate functions. If the assumePod call was not called, that will 
cause the shim side cache (for predicates) to become stale. This may cause the 
inaccurate evaluation of the predicate functions, such as when dealing with 
pod-affinity/anti-affinity constraints, volume bindings, etc. 

The error message indicates that the pod has been already removed from the 
cache. This is because it happens in the following order:
# On K8s, a pod gets deleted
# Shim removes the pod from the cache
# Shim sends a release request to the core and asks the core to release the 
allocation
# Core releases the allocation and call the ForgetPod callback
# Shim tries to remove the pod again and gives error because the pod no longer 
exists

The remove action is always initiated from the shim side, so it is probably OK 
to remove the ForgetPod call from the core side. This needs to be carefully 
tested. Because so far the predicates are running pretty stable, we do not want 
to break any of that.



> Streamline core to shim update on allocation change
> ---------------------------------------------------
>
>                 Key: YUNIKORN-462
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-462
>             Project: Apache YuniKorn
>          Issue Type: Improvement
>          Components: core - scheduler, shim - kubernetes
>            Reporter: Wilfred Spiegelenburg
>            Priority: Major
>
> Currently in the scheduler we have two updates that get send to the shim when 
> an allocation is added or released:
> * event to shim RM event handler to allocate
> * reconciler plugin to update the shim caches
> Before YUNIKORN-317 one update was made in the cace the other in the 
> scheduler. Now they are both in the scheduler in quick succession. The cache 
> update in the shim is needed to make sure that the predicates are seeing the 
> correct info. The event does the real bind etc of the allocation on the node.
> We should be able to fold the two calls into one call. However this requires 
> changes on both sides and might even impact the SI as it will likely become a 
> synced event call.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to