[
https://issues.apache.org/jira/browse/YUNIKORN-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140092#comment-17140092
]
Wilfred Spiegelenburg commented on YUNIKORN-230:
------------------------------------------------
There was a race condition in the way we were handling resource requests. This
affected the start transition and could cause the app to go {{accepted -->
waiting --> running}} instead of {{accepted --> starting --> running}}. This
was fixed a couple of days ago via YUNIKORN-222. Because of that race the
application would only spent real a short time in anything but the {{running}}
state and you would not see it in the UI.
The state time out for starting is 5 minutes. So if you run an app with just
one pod/container it will be in starting for a max of 5 minutes.
This is what I see in my system and is correct based on the documentation I
added earlier this week via YUNIKORN-99
> Incorrect application state returned for "ws/v1/apps" REST call
> ---------------------------------------------------------------
>
> Key: YUNIKORN-230
> URL: https://issues.apache.org/jira/browse/YUNIKORN-230
> Project: Apache YuniKorn
> Issue Type: Bug
> Components: core - scheduler
> Reporter: Ayub Pathan
> Priority: Major
>
> Yunikorn image latest
> {noformat}
> yunikorn-scheduler-web:
> Container ID:
> docker://e649fb4db6a3b822bb2a6bc5e4bf607e5848548df7eb57f24dbe37550ec87ec3
> Image: apache/yunikorn:web-0.9.0
> Image ID:
> docker-pullable://apache/yunikorn@sha256:52e5cfc8823e38d50249f2c3fcd50b0f2755ffb79534482e0d9d67f8b8e604f3
> {noformat}
> *Steps to reproduce:*
> * Deploy the job
> * Check the status
> {noformat}
> kubectl get pods -n development
>
> NAME READY STATUS RESTARTS AGE
> sleepjob 1/1 Running 0 26s {noformat}
> * Verify the API response, it still shows as starting..
> {noformat}
> [
> {
> "allocations": [
> {
> "allocationKey": "11fc645b-b0c7-11ea-aeee-0e65480c53e2",
> "allocationTags": null,
> "applicationId": "abcd",
> "nodeId": "ip-10-192-172-176.ca-central-1.compute.internal",
> "partition": "default",
> "priority": "<nil>",
> "queueName": "root.development",
> "resource": "[memory:50 vcore:100]",
> "uuid": "5a35de0b-b0d4-4434-9f17-3618faa0e247"
> }
> ],
> "applicationID": "abcd",
> "applicationState": "Starting",
> "partition": "[mycluster]default",
> "queueName": "root.development",
> "submissionTime": 1592417964439019878,
> "usedResource": "[memory:50 vcore:100]"
> }
> ] {noformat}
> * Job completed
> {noformat}
> kubectl get pods -n development
>
> NAME READY STATUS RESTARTS AGE
> sleepjob 0/1 Completed 0 60s {noformat}
> * Still API response shows the status as STARTING
> {noformat}
> [
> {
> "allocations": null,
> "applicationID": "abcd",
> "applicationState": "Starting",
> "partition": "[mycluster]default",
> "queueName": "root.development",
> "submissionTime": 1592417964439019878,
> "usedResource": "[memory:0 vcore:0]"
> }
> ] {noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]