[ 
https://issues.apache.org/jira/browse/YUNIKORN-551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17294287#comment-17294287
 ] 

Weiwei Yang commented on YUNIKORN-551:
--------------------------------------

I've merged the changes to master and branch-0.10.

> node removal races for lock during scheduling
> ---------------------------------------------
>
>                 Key: YUNIKORN-551
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-551
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: core - scheduler
>    Affects Versions: 0.10
>            Reporter: Wilfred Spiegelenburg
>            Assignee: Wilfred Spiegelenburg
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 0.10
>
>
> A more complicated version of the dead lock mentioned in YUNIKORN-481.
> In this case the scheduler is racing with the node removal which in turn 
> removes allocations from the application. The locks taken are al short term 
> locks but it could happen that the application being scheduled also has an 
> allocation on a node being removed.
> Scheduling requires the write locked app to request a read lock on the 
> partition to get all known nodes. The partition write locks while removing 
> the node from its internal list and keeps hold of that write lock while 
> removing the allocations which tries to lock the app.
> The partition should have released the lock immediately after the node was 
> removed from the list as the rest of the updates are not modifying the 
> partition object.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to