[
https://issues.apache.org/jira/browse/YUNIKORN-103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17088014#comment-17088014
]
Weiwei Yang commented on YUNIKORN-103:
--------------------------------------
[~adam.antal] is correct.
At present, we don't have a e2e app state mgmt in the scheduler-core. Right
now, as long as a application is accepted, we consider it is running, and it
never finishes (I know this (n)). There is far more work to do than fixing a
"bug", to have a complete app lifecycle management, we will need to understand
app more, there are a couple of ways
* we can do something special for Spark, or apps like Spark, where they have a
driver (AM) pod. when the driver is running, we can consider the job is
running; when the driver is completed, we can consider the job is completed
* we can leverage K8s operators, such as Spark/Flink/TF operators, these CRDs
will have a clear definition about app state. for this effort, please see
[https://github.com/apache/incubator-yunikorn-core/blob/master/docs/design/pluggable-app-management.md].
In my prototype for YUNIKORN-100, I am trying to improve the state transitions
on the yunikorn-core side, that's the nearest improvements we can do for this.
> Web UI shows the applications as Runnnig, even if they are in Pending state
> ---------------------------------------------------------------------------
>
> Key: YUNIKORN-103
> URL: https://issues.apache.org/jira/browse/YUNIKORN-103
> Project: Apache YuniKorn
> Issue Type: Bug
> Components: core - cache, webapp
> Reporter: Kinga Marton
> Priority: Minor
> Attachments: Screenshot 2020-04-20 at 14.27.50.png, Screenshot
> 2020-04-20 at 14.28.24.png
>
>
> When there are some applications what are unschedulable due to some reason
> (in my case it was a resource problem), in the Kubernetes dashboard I could
> see that the Pod was in pending state, but the Yunikorn api showed that the
> application is running.
> I think this mismatch can cause misunderstandings and it would be good to
> have the status of an application in synch with the pod status, or at least
> document the state machine of the applications.
> [~wwei] do we have such a documentation? If yes, please share it with me and
> the issue can be closed.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]