[
https://issues.apache.org/jira/browse/YUNIKORN-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Peter Bacsko updated YUNIKORN-2764:
-----------------------------------
Target Version: 1.9.0 (was: 1.8.0)
> Consider to log explicit placeholder release reason to originator pod
> ---------------------------------------------------------------------
>
> Key: YUNIKORN-2764
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2764
> Project: Apache YuniKorn
> Issue Type: Improvement
> Components: shim - kubernetes
> Reporter: Yu-Lin Chen
> Priority: Major
> Attachments: image-2024-07-19-21-48-54-829.png
>
>
> When placeholders allocation are released with terminationType
> `si.TerminationType_TIMEOUT`. The reason could be one of the following:
> # "releasing allocated placeholders on placeholder timeout"
> ([Link-1|https://github.com/apache/yunikorn-core/blob/21303191a9c9ee791de3371ddca973df51b7a755/pkg/scheduler/objects/application.go#L434])
>
> ([Link-2|https://github.com/apache/yunikorn-core/blob/21303191a9c9ee791de3371ddca973df51b7a755/pkg/scheduler/objects/application.go#L456])
> # "releasing placeholders on app complete"
> ([Link|https://github.com/apache/yunikorn-core/blob/21303191a9c9ee791de3371ddca973df51b7a755/pkg/scheduler/objects/application.go#L360])
> # “cancel placeholder: resource incompatible”
> ([Link|https://github.com/apache/yunikorn-core/blob/21303191a9c9ee791de3371ddca973df51b7a755/pkg/scheduler/objects/application.go#L1148])
> Those reasons are encapsulated in
> *si.AllocationResponse([Link|https://github.com/apache/yunikorn-core/blob/21303191a9c9ee791de3371ddca973df51b7a755/pkg/scheduler/context.go#L901])
> and passes to shim. However, the shim doesn’t expose them, it simply logs an
> event to the originator pod with a generic reason
> ([Link|https://github.com/apache/yunikorn-k8shim/blob/f2819084f8720aa0eec8e1f41a886413b22d93b2/pkg/cache/application.go#L695-L696]):
> * Type: Warning
> * Reason: GangScheduling
> * Message: Application XXXXXX placeholder has been timed out
> We could consider to expose the true reason to originator pod. Ex: (In
> originator pod.)
> * Type: Warning
> * Reason: GangScheduling
> * Message: placeholder xxx has been released. (reason: cancel placeholder:
> resource incompatible)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]