[ 
https://issues.apache.org/jira/browse/YUNIKORN-2737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Hua Lan reassigned YUNIKORN-2737:
-------------------------------------

    Assignee: Tzu-Hua Lan

> Cleanup handleFailApplicationEvent handling
> -------------------------------------------
>
>                 Key: YUNIKORN-2737
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2737
>             Project: Apache YuniKorn
>          Issue Type: Improvement
>          Components: shim - kubernetes
>            Reporter: Wilfred Spiegelenburg
>            Assignee: Tzu-Hua Lan
>            Priority: Major
>
> When we handle a failed application in the shim in 
> {{handleFailApplicationEvent()}} we call the placeholder cleanup.
> Three issues:
>  * The cleanup needs the app lock after it takes the mgr lock. The app lock 
> is already held when we process the event. Should be placing the cleanup last 
> to not hold the manager lock for longer than needed
>  * failing an application is triggered by the core which should do the 
> cleanup already so this might be redundant to start with.
>  * The failure handling also marks unassigned pods as failed which means 
> there is an overlap between the failure handling and the placeholder cleanup 
> which we should remove. Either ignore all placeholders in the failure or only 
> cleanup assigned placeholders.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to