[
https://issues.apache.org/jira/browse/YUNIKORN-169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17111601#comment-17111601
]
Wilfred Spiegelenburg commented on YUNIKORN-169:
------------------------------------------------
The panic appears when the allocation ask is removed before we can confirm the
allocation in the scheduler.
At that point the cache has been updated with the allocation and the scheduler
is finalising the allocation in the scheduler.
The shim has been send and update with the fact that the allocation has
happened. This is part of the cache update at the same time as we update the
scheduler. Before the shim has processed that update it still sees the ask as
outstanding. It then cancels the ask, reason unknown, and not relevant for the
issue. The scheduler processes the ask cancellation in a separate go routine
compared to the main scheduler processing. The confirmation and the
cancellation are now racing.
If the cancellation wins the confirmations panics.
If the confirmation wins there is no issue and the shim has been given an
allocation which it can cancel.
> panic when removing allocation ask with inflight allocation
> -------------------------------------------------------------
>
> Key: YUNIKORN-169
> URL: https://issues.apache.org/jira/browse/YUNIKORN-169
> Project: Apache YuniKorn
> Issue Type: Bug
> Components: core - scheduler
> Reporter: Wilfred Spiegelenburg
> Assignee: Wilfred Spiegelenburg
> Priority: Major
> Labels: pull-request-available
> Attachments: yunikorn-scheduler-panic.txt
>
>
> There is a race that causes a nil pointer and thus panic when an allocation
> ask is removed while the allocation is in progress. The panic is captured in
> the attached log.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]