[ 
https://issues.apache.org/jira/browse/YUNIKORN-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140092#comment-17140092
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-230:
------------------------------------------------

There was a race condition in the way we were handling resource requests. This 
affected the start transition and could cause the app to go {{accepted --> 
waiting --> running}} instead of {{accepted --> starting --> running}}. This 
was fixed a couple of days ago via YUNIKORN-222. Because of that race the 
application would only spent real a short time in anything but the {{running}} 
state and you would not see it in the UI.

The state time out for starting is 5 minutes. So if you run an app with just 
one pod/container it will be in starting for a max of 5 minutes.

This is what I see in my system and is correct based on the documentation I 
added earlier this week via YUNIKORN-99

> Incorrect application state returned for "ws/v1/apps" REST call
> ---------------------------------------------------------------
>
>                 Key: YUNIKORN-230
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-230
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: core - scheduler
>            Reporter: Ayub Pathan
>            Priority: Major
>
>  Yunikorn image latest
> {noformat}
>   yunikorn-scheduler-web:
>     Container ID:   
> docker://e649fb4db6a3b822bb2a6bc5e4bf607e5848548df7eb57f24dbe37550ec87ec3
>     Image:          apache/yunikorn:web-0.9.0
>     Image ID:       
> docker-pullable://apache/yunikorn@sha256:52e5cfc8823e38d50249f2c3fcd50b0f2755ffb79534482e0d9d67f8b8e604f3
>  {noformat}
> *Steps to reproduce:*
>  * Deploy the job
>  * Check the status
> {noformat}
> kubectl get pods -n development                                               
>                             
> NAME       READY   STATUS    RESTARTS   AGE
> sleepjob   1/1     Running   0          26s {noformat}
>  * Verify the API response, it still shows as starting..
> {noformat}
> [
>     {
>         "allocations": [
>             {
>                 "allocationKey": "11fc645b-b0c7-11ea-aeee-0e65480c53e2",
>                 "allocationTags": null,
>                 "applicationId": "abcd",
>                 "nodeId": "ip-10-192-172-176.ca-central-1.compute.internal",
>                 "partition": "default",
>                 "priority": "<nil>",
>                 "queueName": "root.development",
>                 "resource": "[memory:50 vcore:100]",
>                 "uuid": "5a35de0b-b0d4-4434-9f17-3618faa0e247"
>             }
>         ],
>         "applicationID": "abcd",
>         "applicationState": "Starting",
>         "partition": "[mycluster]default",
>         "queueName": "root.development",
>         "submissionTime": 1592417964439019878,
>         "usedResource": "[memory:50 vcore:100]"
>     }
> ] {noformat}
>  * Job completed
> {noformat}
> kubectl get pods -n development                                               
>                             
> NAME       READY   STATUS      RESTARTS   AGE
> sleepjob   0/1     Completed   0          60s {noformat}
>  * Still API response shows the status as STARTING
> {noformat}
> [
>     {
>         "allocations": null,
>         "applicationID": "abcd",
>         "applicationState": "Starting",
>         "partition": "[mycluster]default",
>         "queueName": "root.development",
>         "submissionTime": 1592417964439019878,
>         "usedResource": "[memory:0 vcore:0]"
>     }
> ] {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to