[ 
https://issues.apache.org/jira/browse/YUNIKORN-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit updated YUNIKORN-2764:
-----------------------------------
    Target Version: 1.8.0  (was: 1.7.0)

> Consider to log explicit placeholder release reason to originator pod
> ---------------------------------------------------------------------
>
>                 Key: YUNIKORN-2764
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2764
>             Project: Apache YuniKorn
>          Issue Type: Improvement
>          Components: shim - kubernetes
>            Reporter: Yu-Lin Chen
>            Priority: Major
>         Attachments: image-2024-07-19-21-48-54-829.png
>
>
> When placeholders allocation are released with terminationType 
> `si.TerminationType_TIMEOUT`. The reason could be one of the following:
>  # "releasing allocated placeholders on placeholder timeout" 
> ([Link-1|https://github.com/apache/yunikorn-core/blob/21303191a9c9ee791de3371ddca973df51b7a755/pkg/scheduler/objects/application.go#L434])
>  
> ([Link-2|https://github.com/apache/yunikorn-core/blob/21303191a9c9ee791de3371ddca973df51b7a755/pkg/scheduler/objects/application.go#L456])
>  # "releasing placeholders on app complete" 
> ([Link|https://github.com/apache/yunikorn-core/blob/21303191a9c9ee791de3371ddca973df51b7a755/pkg/scheduler/objects/application.go#L360])
>  # “cancel placeholder: resource incompatible” 
> ([Link|https://github.com/apache/yunikorn-core/blob/21303191a9c9ee791de3371ddca973df51b7a755/pkg/scheduler/objects/application.go#L1148])
> Those reasons are encapsulated in 
> *si.AllocationResponse([Link|https://github.com/apache/yunikorn-core/blob/21303191a9c9ee791de3371ddca973df51b7a755/pkg/scheduler/context.go#L901])
>  and passes to shim. However, the shim doesn’t expose them, it simply logs an 
> event to the originator pod with a generic reason 
> ([Link|https://github.com/apache/yunikorn-k8shim/blob/f2819084f8720aa0eec8e1f41a886413b22d93b2/pkg/cache/application.go#L695-L696]):
>  * Type: Warning
>  * Reason: GangScheduling
>  * Message: Application XXXXXX placeholder has been timed out
> We could consider to expose the true reason to originator pod. Ex: (In 
> originator pod.)
>  * Type: Warning
>  * Reason: GangScheduling
>  * Message: placeholder xxx has been released. (reason: cancel placeholder: 
> resource incompatible)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to