[
https://issues.apache.org/jira/browse/YUNIKORN-2526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17854589#comment-17854589
]
Wilfred Spiegelenburg commented on YUNIKORN-2526:
-------------------------------------------------
Removing target for 1.5.2 as discussed in the community sync.
We have two changes going into the 1.5.2 release that fix some issues in this
area: YUNIKORN-2637 and YUNIKORN-2665 Both fix issues in consistencies on
recovery. They might not be the root cause so we keep this open for monitoring
and confirmation.
> Discrepancy between shim cache and core app/task list after scheduler restart
> -----------------------------------------------------------------------------
>
> Key: YUNIKORN-2526
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2526
> Project: Apache YuniKorn
> Issue Type: Bug
> Components: shim - kubernetes
> Reporter: Shravan Achar
> Assignee: Peter Bacsko
> Priority: Major
> Attachments: log-snippet.txt,
> logs-2be04314-bed0-4385-9ae7-50ed0ef9d9d5.txt.zip,
> logs-49f01ed0-3473-4521-b11f-80e27adb7250.txt.zip,
> logs-complete-post.txt.zip, logs-since-restart.txt, state-dump-4-1-3.json,
> state-dump-4-17.json.zip
>
>
> When scheduler restarts, occasionally it gets into a situation where the
> application is still in Running state despite the application getting
> terminated in the cluster. This is confirmed with the attached state dump.
>
> The scheduler core logs indicate all nodes are being evaluated for
> non-existing application (also attached). The CPU is being used up doing this
> unneeded evaluation.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]