[ 
https://issues.apache.org/jira/browse/YUNIKORN-103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17088014#comment-17088014
 ] 

Weiwei Yang commented on YUNIKORN-103:
--------------------------------------

[~adam.antal] is correct.

At present, we don't have a e2e app state mgmt in the scheduler-core. Right 
now, as long as a application is accepted, we consider it is running, and it 
never finishes (I know this (n)). There is far more work to do than fixing a 
"bug", to have a complete app lifecycle management, we will need to understand 
app more, there are a couple of ways
 * we can do something special for Spark, or apps like Spark, where they have a 
driver (AM) pod. when the driver is running, we can consider the job is 
running; when the driver is completed, we can consider the job is completed
 * we can leverage K8s operators, such as Spark/Flink/TF operators, these CRDs 
will have a clear definition about app state. for this effort, please see 
[https://github.com/apache/incubator-yunikorn-core/blob/master/docs/design/pluggable-app-management.md].

In my prototype for YUNIKORN-100, I am trying to improve the state transitions 
on the yunikorn-core side, that's the nearest improvements we can do for this.

> Web UI shows the applications as Runnnig, even if they are in Pending state
> ---------------------------------------------------------------------------
>
>                 Key: YUNIKORN-103
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-103
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: core - cache, webapp
>            Reporter: Kinga Marton
>            Priority: Minor
>         Attachments: Screenshot 2020-04-20 at 14.27.50.png, Screenshot 
> 2020-04-20 at 14.28.24.png
>
>
> When there are some applications what are unschedulable due to some reason 
> (in my case it was a resource problem), in the Kubernetes dashboard I could 
> see that the Pod was in pending state, but the Yunikorn api showed that the 
> application is running. 
> I think this mismatch can cause misunderstandings and it would be good to 
> have the status of an application in synch with the pod status, or at least 
> document the state machine of the applications.
> [~wwei] do we have such a documentation? If yes, please share it with me and 
> the issue can be closed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to