[
https://issues.apache.org/jira/browse/YUNIKORN-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298539#comment-17298539
]
Wilfred Spiegelenburg commented on YUNIKORN-462:
------------------------------------------------
This can be pushed to the next release.
The removal of a pod is however not always triggered from the shim. With gang
scheduling the core removes a placeholder pod and then tells the shim to
execute that request. Even with that taken into account the cache update should
always be a shim internal action. The core should not need to be aware of the
internal housekeeping of the shim. Releases initiated in the core always wait
for a confirmation of the release from the shim. This is to take into account
that releasing an allocation is not instantaneous. Neither core nor shim
initiated releases should need to follow that path as the shim must update all
its internal structures before communicating the release or the release
confirmation.
For the adding an allocation: if the core confirms that the allocation is added
to a node etc internal core structures are updated immediately. If the shim
update needs to become a more synchronous process to allow a shim to update its
internal structure before the core proceeds then that needs to be looked at.
I have gone back in time over old logs and already see this same message being
logged as far back as June last year (oldest log I had). Before work even
started on YUNIKORN-317. This infers that the failing sync via the plugin has
been an issue for a much longer time. It also means that this is not new in
v0.10 as the log was taken long before v0.9 was released.
> Streamline core to shim update on allocation change
> ---------------------------------------------------
>
> Key: YUNIKORN-462
> URL: https://issues.apache.org/jira/browse/YUNIKORN-462
> Project: Apache YuniKorn
> Issue Type: Improvement
> Components: core - scheduler, shim - kubernetes
> Reporter: Wilfred Spiegelenburg
> Priority: Major
>
> Currently in the scheduler we have two updates that get send to the shim when
> an allocation is added or released:
> * event to shim RM event handler to allocate
> * reconciler plugin to update the shim caches
> Before YUNIKORN-317 one update was made in the cace the other in the
> scheduler. Now they are both in the scheduler in quick succession. The cache
> update in the shim is needed to make sure that the predicates are seeing the
> correct info. The event does the real bind etc of the allocation on the node.
> We should be able to fold the two calls into one call. However this requires
> changes on both sides and might even impact the SI as it will likely become a
> synced event call.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]