[ 
https://issues.apache.org/jira/browse/YUNIKORN-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-1900.
----------------------------------

> Orphan allocation due to placeholder deletes
> --------------------------------------------
>
>                 Key: YUNIKORN-1900
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-1900
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: core - scheduler
>            Reporter: Wilfred Spiegelenburg
>            Assignee: Wilfred Spiegelenburg
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.4.0
>
>
> Gang scheduled applications can leave orphaned allocations. The reason this 
> can happen is that the gang scheduling setup is only specifying one taskgroup 
> with one member for the app.
> This by itself is not a problem and works. A replacement of the placeholder 
> with the real allocation triggers the issue. It temporarily removes all 
> allocations and with only 1 gang member leaves no pending asks. That is the 
> trigger for the state change of the application to COMPLETING. This is 
> correct state change for the app if nothing is left, no allocations or asks.
> Triggering the state change is however a problem. If the allocation of the 
> driver would not be a replacement the COMPLETING application moves to RUNNING 
> via a state update. We trigger a state change in that case and the issue does 
> not occur. For placeholder replacements we trigger the state change, if 
> needed, on the removal of the placeholder. Not when the real allocation is 
> confirmed.
> If the confirmation is processed before the COMPLETING state times out the 
> allocation is added to the node and never cleaned up. When the COMPLETING 
> state times out the application gets removed without the cleanup of the 
> allocation.
> The allocation cleanup does not get triggered as the COMPLETING state should 
> never be entered with allocations on the app.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to