[
https://issues.apache.org/jira/browse/YUNIKORN-657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chaoran Yu updated YUNIKORN-657:
--------------------------------
Description:
An application may fail for a number of reasons. For example,
* In gang scheduling, placeholders have expired before all of them can be
successfully allocated
* When no placement rules are defined (i.e. static queues are used), an
application is submitted to an non-existent queue
* The total amount of resources requested by a gang-scheduled app exceeds the
capacity of the queue
YK's the finite state machine has Failed as a terminal state of an app, meaning
that YK won't try to bring back a failed app ever again. The consequence is
that pods of such failed apps will be stuck in pending indefinitely. A better
behavior is for YK to mark those pods as failed too, while also passing the
reason of the failure to those pods.
was:
An application may fail for a number of reasons. For example,
* In gang scheduling, placeholders have expired before all of them can be
successfully allocated
* When no placement rules are defined (i.e. static queues are used), an
application is submitted to an non-existent queue
* The total amount of resources requested by a gang-scheduled app exceeds the
capacity of the queue
YK's the finite state machine has Failed as a terminal state of an app, meaning
that YK won't try to bring back a failed app ever again. The consequence is
that pods of such failed apps will be stuck in pending indefinitely. A better
behavior is for YK to mark those pods as failed too, while also passing the
reason of the failure to those pods.
> Expose reason of application failure to pods
> --------------------------------------------
>
> Key: YUNIKORN-657
> URL: https://issues.apache.org/jira/browse/YUNIKORN-657
> Project: Apache YuniKorn
> Issue Type: Improvement
> Components: shim - kubernetes
> Reporter: Chaoran Yu
> Assignee: Chaoran Yu
> Priority: Major
>
> An application may fail for a number of reasons. For example,
> * In gang scheduling, placeholders have expired before all of them can be
> successfully allocated
> * When no placement rules are defined (i.e. static queues are used), an
> application is submitted to an non-existent queue
> * The total amount of resources requested by a gang-scheduled app exceeds the
> capacity of the queue
> YK's the finite state machine has Failed as a terminal state of an app,
> meaning that YK won't try to bring back a failed app ever again. The
> consequence is that pods of such failed apps will be stuck in pending
> indefinitely. A better behavior is for YK to mark those pods as failed too,
> while also passing the reason of the failure to those pods.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]