Wilfred Spiegelenburg created YUNIKORN-551:
----------------------------------------------

             Summary: node removal races for lock during scheduling
                 Key: YUNIKORN-551
                 URL: https://issues.apache.org/jira/browse/YUNIKORN-551
             Project: Apache YuniKorn
          Issue Type: Bug
          Components: core - scheduler
    Affects Versions: 0.10
            Reporter: Wilfred Spiegelenburg
            Assignee: Wilfred Spiegelenburg


A more complicated version of the dead lock mentioned in YUNIKORN-481.

In this case the scheduler is racing with the node removal which in turn 
removes allocations from the application. The locks taken are al short term 
locks but it could happen that the application being scheduled also has an 
allocation on a node being removed.

Scheduling requires the write locked app to request a read lock on the 
partition to get all known nodes. The partition write locks while removing the 
node from its internal list and keeps hold of that write lock while removing 
the allocations which tries to lock the app.

The partition should have released the lock immediately after the node was 
removed from the list as the rest of the updates are not modifying the 
partition object.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to