[jira] [Commented] (YUNIKORN-462) Streamline core to shim update on allocation change

Wilfred Spiegelenburg (Jira) Tue, 09 Mar 2021 21:59:06 -0800


    [ 
https://issues.apache.org/jira/browse/YUNIKORN-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298539#comment-17298539
 ]


Wilfred Spiegelenburg commented on YUNIKORN-462:
------------------------------------------------

This can be pushed to the next release.

The removal of a pod is however not always triggered from the shim. With gang 
scheduling the core removes a placeholder pod and then tells the shim to 
execute that request. Even with that taken into account the cache update should 
always be a shim internal action. The core should not need to be aware of the 
internal housekeeping of the shim. Releases initiated in the core always wait 
for a confirmation of the release from the shim. This is to take into account 
that releasing an allocation is not instantaneous. Neither core nor shim 
initiated releases should need to follow that path as the shim must update all 
its internal structures before communicating the release or the release 
confirmation.

For the adding an allocation: if the core confirms that the allocation is added 
to a node etc internal core structures are updated immediately. If the shim 
update needs to become a more synchronous process to allow a shim to update its 
internal structure before the core proceeds then that needs to be looked at.

I have gone back in time over old logs and already see this same message being 
logged as far back as June last year (oldest log I had). Before work even 
started on YUNIKORN-317. This infers that the failing sync via the plugin has 
been an issue for a much longer time. It also means that this is not new in 
v0.10 as the log was taken long before v0.9 was released.

> Streamline core to shim update on allocation change
> ---------------------------------------------------
>
>                 Key: YUNIKORN-462
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-462
>             Project: Apache YuniKorn
>          Issue Type: Improvement
>          Components: core - scheduler, shim - kubernetes
>            Reporter: Wilfred Spiegelenburg
>            Priority: Major
>
> Currently in the scheduler we have two updates that get send to the shim when 
> an allocation is added or released:
> * event to shim RM event handler to allocate
> * reconciler plugin to update the shim caches
> Before YUNIKORN-317 one update was made in the cace the other in the 
> scheduler. Now they are both in the scheduler in quick succession. The cache 
> update in the shim is needed to make sure that the predicates are seeing the 
> correct info. The event does the real bind etc of the allocation on the node.
> We should be able to fold the two calls into one call. However this requires 
> changes on both sides and might even impact the SI as it will likely become a 
> synced event call.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YUNIKORN-462) Streamline core to shim update on allocation change

Reply via email to