[
https://issues.apache.org/jira/browse/YUNIKORN-551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17294287#comment-17294287
]
Weiwei Yang commented on YUNIKORN-551:
--------------------------------------
I've merged the changes to master and branch-0.10.
> node removal races for lock during scheduling
> ---------------------------------------------
>
> Key: YUNIKORN-551
> URL: https://issues.apache.org/jira/browse/YUNIKORN-551
> Project: Apache YuniKorn
> Issue Type: Bug
> Components: core - scheduler
> Affects Versions: 0.10
> Reporter: Wilfred Spiegelenburg
> Assignee: Wilfred Spiegelenburg
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 0.10
>
>
> A more complicated version of the dead lock mentioned in YUNIKORN-481.
> In this case the scheduler is racing with the node removal which in turn
> removes allocations from the application. The locks taken are al short term
> locks but it could happen that the application being scheduled also has an
> allocation on a node being removed.
> Scheduling requires the write locked app to request a read lock on the
> partition to get all known nodes. The partition write locks while removing
> the node from its internal list and keeps hold of that write lock while
> removing the allocations which tries to lock the app.
> The partition should have released the lock immediately after the node was
> removed from the list as the rest of the updates are not modifying the
> partition object.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]