[
https://issues.apache.org/jira/browse/YUNIKORN-2737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tzu-Hua Lan reassigned YUNIKORN-2737:
-------------------------------------
Assignee: Tzu-Hua Lan
> Cleanup handleFailApplicationEvent handling
> -------------------------------------------
>
> Key: YUNIKORN-2737
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2737
> Project: Apache YuniKorn
> Issue Type: Improvement
> Components: shim - kubernetes
> Reporter: Wilfred Spiegelenburg
> Assignee: Tzu-Hua Lan
> Priority: Major
>
> When we handle a failed application in the shim in
> {{handleFailApplicationEvent()}} we call the placeholder cleanup.
> Three issues:
> * The cleanup needs the app lock after it takes the mgr lock. The app lock
> is already held when we process the event. Should be placing the cleanup last
> to not hold the manager lock for longer than needed
> * failing an application is triggered by the core which should do the
> cleanup already so this might be redundant to start with.
> * The failure handling also marks unassigned pods as failed which means
> there is an overlap between the failure handling and the placeholder cleanup
> which we should remove. Either ignore all placeholders in the failure or only
> cleanup assigned placeholders.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]