[
https://issues.apache.org/jira/browse/YUNIKORN-551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated YUNIKORN-551:
------------------------------------
Labels: pull-request-available (was: )
> node removal races for lock during scheduling
> ---------------------------------------------
>
> Key: YUNIKORN-551
> URL: https://issues.apache.org/jira/browse/YUNIKORN-551
> Project: Apache YuniKorn
> Issue Type: Bug
> Components: core - scheduler
> Affects Versions: 0.10
> Reporter: Wilfred Spiegelenburg
> Assignee: Wilfred Spiegelenburg
> Priority: Blocker
> Labels: pull-request-available
>
> A more complicated version of the dead lock mentioned in YUNIKORN-481.
> In this case the scheduler is racing with the node removal which in turn
> removes allocations from the application. The locks taken are al short term
> locks but it could happen that the application being scheduled also has an
> allocation on a node being removed.
> Scheduling requires the write locked app to request a read lock on the
> partition to get all known nodes. The partition write locks while removing
> the node from its internal list and keeps hold of that write lock while
> removing the allocations which tries to lock the app.
> The partition should have released the lock immediately after the node was
> removed from the list as the rest of the updates are not modifying the
> partition object.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]