[
https://issues.apache.org/jira/browse/YUNIKORN-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
haorenhui updated YUNIKORN-2989:
--------------------------------
Description:
h2. Production environment phenomenon
I encountered a state transition issue during testing. The phenomenon is that I
have two pods using the same applicationId, and pod2 is submitted at a certain
point in time after pod1 completes running. When pod1 finishes running, the app
(flow1) it belongs to transitions to the completing state, and it waits for 30
seconds (default) before changing to the completed state. When I submit pod2,
it may remain in the pending state. Upon checking the yunikorn scheduler, the
following log was found: after pod2 was added and its status changed to
pending, the app was deleted.
{code:none}
2024-11-25T12:07:43.137Z INFO objects/application_state.go:133 Application
state transition {"appID": "flow1", "source": "Running", "destination":
"Completing", "event": "completeApplication"}
2024-11-25T12:07:43.137Z INFO scheduler/partition.go:1249 removing allocation
from application
{"appID":"flow1","allocationId":"a2e95dfb-3646-42b8-b774-6b2abe84bae6","terminationType":
"STOPPED_BY_RM"}
2024-11-25T12:08:12.903Z INFO cachefcontext.go:871 task added
{"appID":"flow1","taskID":"16f461e9-a856-428e-a27b-e67d9105da69", "taskState":
"New"}
2024-11-25T12:08:13.137Z INFO objects/application.go:127 YK_APP_SUMMARY:
{"appID":
"flow1","submissionTime":1732533620209,"startTime":1732533622247,"finishTime":
1732536493137,"user":"admin","queue":"root.default","state":
"Completed","rmID":"cluster",resourceUsage":
{"UNKNOHN":"ephemeral-storage":1009530880000000,"memory":1034360822300672,"pods":98587,"vcore":2221462721}}
24-11-25T12:08:13.137Z INFD scheduler/partition.go:1418 Renoving terminated
applicetion from the application list {"appID": "flow1","app
status":"Completed"}
2024-11-25T12:08:13.137Z INFO objects/application_state.go:133 Application
state transition
Unknown macro: \{"appID"}
2024-11-25T12:08:13.143Z INFO cache/task_state.go:380 Task state transition
{"app":"flow1","task":"16f461e9-a856-428e-a27b-e67d91ca69","taskAlias":"test/pod2",
"source": "New","destination":"Pending","event": "InitTask"}
{code}
# 16f461e9-a856-428e-a27b-e67d91ca69 will not be scheduled until I manually
delete it.
# This issue is not necessary and requires some specific triggering conditions
(there are many pods and apps in my environment, and this problem occasionally
occurs; if pod2 is submitted within 30 seconds, the app will switch to
Running,Submit after 30 seconds, the app will be rebuilt).
# I feel that when adding a task, the application in the core starts to change
to completed, and then triggers callback to start clearing resources. After the
app to which the task belongs is cleared, the task becomes an orphan task, and
ultimately the task is not scheduled by anyone.
h2. Test environment reproduction
I have successfully replicated this situation in a stable testing
environment.This is my testing procedure.
Create 100 Pods simultaneously in a queue, each belonging to a separate
Application (pod1->app1, pod2->app2...). These 100 Pods will automatically end
after 30 seconds of startup (sleep 30). Afterwards, start the script to check
the status of these pods. After detecting completion, wait for 29 seconds and
create pods for the same application.
{code:sh}
#!/bin/bash
num_pods=100
for i in $(seq 1 $num_pods); do
nohup ./check.sh $i >> check.log 2>&1 &
echo "Start run pod pod$i"
done
{code}
* Create pod-1 (app: app1)
* Check the status of pod-1 every 1 second
* Detected completion of pod-1 (at this point, the status of app1 will change
from running to completing)
* sleep 29 (completingTimeout defaults to 30 seconds, try to create a new task
as close to this time as possible)
* Create pod2-1 (app: app1) (at this point, the status of app1 may change from
'completing' to 'completed')
* If app-1 becomes in the completed state, pod2-1 will not be scheduled again
and will remain in the pending state until it is manually deleted.
{code:sh}
#!/bin/bash
i=$1
kubectl apply -f pods/pod-${i}.yaml
while true
do
POD_STATUS=$(kubectl get pod -n test pod${i} -o
jsonpath='{.status.phase}')
if [ "$POD_STATUS" == "Succeeded" ]; then
echo "Pod pod${i} is in Completed state."
sleep 29
kubectl apply -f pods2/pod2-${i}.yaml
exit
else
sleep 1
echo "Pod pod$i is not in Completed state. Current state: $POD_STATUS"
fi
done
{code}
h2. Complete logs of abnormal tasks
{code:none}
2024-11-27T12:27:49.535Z INFO cache/context.go:780 app added
{"appID": "case-90"}
2024-11-27T12:27:49.535Z INFO cache/context.go:871 task added
{"appID": "case-90", "taskID": "67c4c59c-677d-4217-b57c-b1ed57f175b8",
"taskState": "New"}
2024-11-27T12:27:49.535Z INFO cache/context.go:881 app request
originating pod added {"appID": "case-90", "original task":
"67c4c59c-677d-4217-b57c-b1ed57f175b8"}
2024-11-27T12:27:50.275Z INFO cache/application.go:428 handle
app submission {"app": "applicationID: case-90, queue: root.group.hrh.test,
partition: default, totalNumOfTasks: 1, currentState: Submitted", "clusterID":
"mycluster"}
2024-11-27T12:27:50.276Z INFO placement/provided_rule.go:109
Provided rule application placed {"application": "case-90", "queue":
"root.group.hrh.test"}
2024-11-27T12:27:50.276Z INFO scheduler/context.go:549 Added
application to partition {"applicationID": "case-90", "partitionName":
"[mycluster]default", "requested queue": "root.group.hrh.test", "placed queue":
"root.group.hrh.test"}
2024-11-27T12:27:50.276Z INFO callback/scheduler_callback.go:118
Accepting app {"appID": "case-90"}
2024-11-27T12:27:51.276Z INFO cache/application.go:526 Skip
the reservation stage {"appID": "case-90"}
2024-11-27T12:27:52.278Z INFO cache/task_state.go:380 Task state
transition {"app": "case-90", "task": "67c4c59c-677d-4217-b57c-b1ed57f175b8",
"taskAlias": "test/pod-app-90", "source": "New", "destination": "Pending",
"event": "InitTask"}
2024-11-27T12:27:52.279Z INFO cache/task_state.go:380 Task state
transition {"app": "case-90", "task": "67c4c59c-677d-4217-b57c-b1ed57f175b8",
"taskAlias": "test/pod-app-90", "source": "Pending", "destination":
"Scheduling", "event": "SubmitTask"}
2024-11-27T12:27:52.281Z INFO objects/application_state.go:133
Application state transition {"appID": "case-90", "source": "New",
"destination": "Accepted", "event": "runApplication"}
2024-11-27T12:27:52.281Z INFO objects/application.go:668 ask
added successfully to application {"appID": "case-90", "ask":
"67c4c59c-677d-4217-b57c-b1ed57f175b8", "placeholder": false, "pendingDelta":
"map[memory:31457280 pods:1 vcore:5]"}
2024-11-27T12:27:52.288Z INFO objects/application_state.go:133
Application state transition {"appID": "case-90", "source": "Accepted",
"destination": "Starting", "event": "runApplication"}
2024-11-27T12:27:52.288Z INFO scheduler/partition.go:892
scheduler allocation processed {"appID": "case-90", "allocationKey":
"67c4c59c-677d-4217-b57c-b1ed57f175b8", "uuid":
"40b93e8c-5c34-430e-a754-93859ca7fe0d", "allocatedResource":
"map[memory:31457280 pods:1 vcore:5]", "placeholder": false, "targetNode":
"node-a"}
2024-11-27T12:27:52.288Z INFO objects/application_state.go:133
Application state transition {"appID": "case-90", "source": "Starting",
"destination": "Running", "event": "runApplication"}
2024-11-27T12:27:52.288Z INFO cache/task_state.go:380 Task state
transition {"app": "case-90", "task": "67c4c59c-677d-4217-b57c-b1ed57f175b8",
"taskAlias": "test/pod-app-90", "source": "Scheduling", "destination":
"Allocated", "event": "TaskAllocated"}
2024-11-27T12:27:52.310Z INFO cache/task_state.go:380 Task state
transition {"app": "case-90", "task": "67c4c59c-677d-4217-b57c-b1ed57f175b8",
"taskAlias": "test/pod-app-90", "source": "Allocated", "destination": "Bound",
"event": "TaskBound"}
2024-11-27T12:30:24.701Z INFO cache/task_state.go:380 Task state
transition {"app": "case-90", "task": "67c4c59c-677d-4217-b57c-b1ed57f175b8",
"taskAlias": "test/pod-app-90", "source": "Bound", "destination": "Completed",
"event": "CompleteTask"}
2024-11-27T12:30:24.701Z INFO scheduler/partition.go:1249
removing allocation from application {"appID": "case-90", "allocationId":
"40b93e8c-5c34-430e-a754-93859ca7fe0d", "terminationType": "STOPPED_BY_RM"}
2024-11-27T12:30:24.701Z INFO objects/application_state.go:133
Application state transition {"appID": "case-90", "source": "Running",
"destination": "Completing", "event": "completeApplication"}
2024-11-27T12:30:54.637Z INFO cache/context.go:871 task added
{"appID": "case-90", "taskID": "d4b59711-aa03-45cf-a217-dcf6915556e5",
"taskState": "New"}
2024-11-27T12:30:54.702Z INFO objects/application_state.go:133
Application state transition {"appID": "case-90", "source": "Completing",
"destination": "Completed", "event": "completeApplication"}
2024-11-27T12:30:54.702Z INFO objects/queue.go:749 Application
completed and removed from queue {"queueName": "root.group.hrh.test",
"applicationID": "case-90"}
2024-11-27T12:30:54.702Z INFO scheduler/partition.go:1418
Removing terminated application from the application list {"appID":
"case-90", "app status": "Completed"}
2024-11-27T12:30:54.702Z INFO objects/application.go:127
YK_APP_SUMMARY: {"appID": "case-90", "submissionTime": 1732710470276,
"startTime": 1732710472288, "finishTime": 1732710654702, "user": "admin",
"queue": "root.group.hrh.test", "state": "Completed", "rmID": "mycluster",
"resourceUsage": {"UNKNOWN":{"memory":4781506560,"pods":152,"vcore":760}}}
{code}
was:
I encountered a state transition issue during testing. The phenomenon is that I
have two pods using the same applicationId, and pod2 is submitted at a certain
point in time after pod1 completes running. When pod1 finishes running, the app
(flow1) it belongs to transitions to the completing state, and it waits for 30
seconds (default) before changing to the completed state. When I submit pod2,
it may remain in the pending state. Upon checking the yunikorn scheduler, the
following log was found: after pod2 was added and its status changed to
pending, the app was deleted.
{panel:title=scheduler log}
2024-11-25T12:07:43.137Z INFO objects/application_state.go:133 Application
state transition {"appID": "flow1", "source": "Running", "destination":
"Completing", "event": "completeApplication"}
2024-11-25T12:07:43.137Z INFO scheduler/partition.go:1249 removing allocation
from application
{"appID":"flow1","allocationId":"a2e95dfb-3646-42b8-b774-6b2abe84bae6","terminationType":
"STOPPED_BY_RM"}
2024-11-25T12:08:12.903Z INFO cachefcontext.go:871 task added
{"appID":"flow1","taskID":"16f461e9-a856-428e-a27b-e67d9105da69", "taskState":
"New"}
2024-11-25T12:08:13.137Z INFO objects/application.go:127 YK_APP_SUMMARY:
{"appID":
"flow1","submissionTime":1732533620209,"startTime":1732533622247,"finishTime":
1732536493137,"user":"admin","queue":"root.default","state":
"Completed","rmID":"cluster",resourceUsage":
{"UNKNOHN":"ephemeral-storage":1009530880000000,"memory":1034360822300672,"pods":98587,"vcore":2221462721}}
24-11-25T12:08:13.137Z INFD scheduler/partition.go:1418 Renoving terminated
applicetion from the application list {"appID": "flow1","app
status":"Completed"}
2024-11-25T12:08:13.137Z INFO objects/application_state.go:133 Application
state transition
Unknown macro: \{"appID"}
2024-11-25T12:08:13.143Z INFO cache/task_state.go:380 Task state transition
{"app":"flow1","task":"16f461e9-a856-428e-a27b-e67d91ca69","taskAlias":"test/pod2",
"source": "New","destination":"Pending","event": "InitTask"}
{panel}
* 16f461e9-a856-428e-a27b-e67d91ca69 will not be scheduled until I manually
delete it.
* This issue is not necessary and requires some specific triggering conditions
(there are many pods and apps in my environment, and this problem occasionally
occurs; if pod2 is submitted within 30 seconds, the app will switch to
Running,Submit after 30 seconds, the app will be rebuilt).
* I feel that when adding a task, the application in the core starts to change
to completed, and then triggers callback to start clearing resources. After the
app to which the task belongs is cleared, the task becomes an orphan task, and
ultimately the task is not scheduled by anyone.
> Orphan task generated during application state switch
> -----------------------------------------------------
>
> Key: YUNIKORN-2989
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2989
> Project: Apache YuniKorn
> Issue Type: Bug
> Components: core - scheduler, shim - kubernetes
> Affects Versions: 1.3.0
> Environment: yunikorn 1.3.0
> Reporter: haorenhui
> Priority: Major
>
> h2. Production environment phenomenon
> I encountered a state transition issue during testing. The phenomenon is that
> I have two pods using the same applicationId, and pod2 is submitted at a
> certain point in time after pod1 completes running. When pod1 finishes
> running, the app (flow1) it belongs to transitions to the completing state,
> and it waits for 30 seconds (default) before changing to the completed state.
> When I submit pod2, it may remain in the pending state. Upon checking the
> yunikorn scheduler, the following log was found: after pod2 was added and its
> status changed to pending, the app was deleted.
> {code:none}
> 2024-11-25T12:07:43.137Z INFO objects/application_state.go:133 Application
> state transition {"appID": "flow1", "source": "Running", "destination":
> "Completing", "event": "completeApplication"}
> 2024-11-25T12:07:43.137Z INFO scheduler/partition.go:1249 removing allocation
> from application
> {"appID":"flow1","allocationId":"a2e95dfb-3646-42b8-b774-6b2abe84bae6","terminationType":
> "STOPPED_BY_RM"}
> 2024-11-25T12:08:12.903Z INFO cachefcontext.go:871 task added
> {"appID":"flow1","taskID":"16f461e9-a856-428e-a27b-e67d9105da69",
> "taskState": "New"}
> 2024-11-25T12:08:13.137Z INFO objects/application.go:127 YK_APP_SUMMARY:
> {"appID":
> "flow1","submissionTime":1732533620209,"startTime":1732533622247,"finishTime":
> 1732536493137,"user":"admin","queue":"root.default","state":
> "Completed","rmID":"cluster",resourceUsage":
> {"UNKNOHN":"ephemeral-storage":1009530880000000,"memory":1034360822300672,"pods":98587,"vcore":2221462721}}
> 24-11-25T12:08:13.137Z INFD scheduler/partition.go:1418 Renoving terminated
> applicetion from the application list {"appID": "flow1","app
> status":"Completed"}
> 2024-11-25T12:08:13.137Z INFO objects/application_state.go:133 Application
> state transition
> Unknown macro: \{"appID"}
> 2024-11-25T12:08:13.143Z INFO cache/task_state.go:380 Task state transition
> {"app":"flow1","task":"16f461e9-a856-428e-a27b-e67d91ca69","taskAlias":"test/pod2",
> "source": "New","destination":"Pending","event": "InitTask"}
> {code}
> # 16f461e9-a856-428e-a27b-e67d91ca69 will not be scheduled until I manually
> delete it.
> # This issue is not necessary and requires some specific triggering
> conditions (there are many pods and apps in my environment, and this problem
> occasionally occurs; if pod2 is submitted within 30 seconds, the app will
> switch to Running,Submit after 30 seconds, the app will be rebuilt).
> # I feel that when adding a task, the application in the core starts to
> change to completed, and then triggers callback to start clearing resources.
> After the app to which the task belongs is cleared, the task becomes an
> orphan task, and ultimately the task is not scheduled by anyone.
> h2. Test environment reproduction
> I have successfully replicated this situation in a stable testing
> environment.This is my testing procedure.
> Create 100 Pods simultaneously in a queue, each belonging to a separate
> Application (pod1->app1, pod2->app2...). These 100 Pods will automatically
> end after 30 seconds of startup (sleep 30). Afterwards, start the script to
> check the status of these pods. After detecting completion, wait for 29
> seconds and create pods for the same application.
> {code:sh}
> #!/bin/bash
> num_pods=100
> for i in $(seq 1 $num_pods); do
> nohup ./check.sh $i >> check.log 2>&1 &
> echo "Start run pod pod$i"
> done
> {code}
> * Create pod-1 (app: app1)
> * Check the status of pod-1 every 1 second
> * Detected completion of pod-1 (at this point, the status of app1 will
> change from running to completing)
> * sleep 29 (completingTimeout defaults to 30 seconds, try to create a new
> task as close to this time as possible)
> * Create pod2-1 (app: app1) (at this point, the status of app1 may change
> from 'completing' to 'completed')
> * If app-1 becomes in the completed state, pod2-1 will not be scheduled
> again and will remain in the pending state until it is manually deleted.
> {code:sh}
> #!/bin/bash
> i=$1
> kubectl apply -f pods/pod-${i}.yaml
> while true
> do
> POD_STATUS=$(kubectl get pod -n test pod${i} -o
> jsonpath='{.status.phase}')
> if [ "$POD_STATUS" == "Succeeded" ]; then
> echo "Pod pod${i} is in Completed state."
> sleep 29
> kubectl apply -f pods2/pod2-${i}.yaml
> exit
> else
> sleep 1
> echo "Pod pod$i is not in Completed state. Current state:
> $POD_STATUS"
> fi
> done
> {code}
>
> h2. Complete logs of abnormal tasks
> {code:none}
> 2024-11-27T12:27:49.535Z INFO cache/context.go:780 app added
> {"appID": "case-90"}
> 2024-11-27T12:27:49.535Z INFO cache/context.go:871 task added
> {"appID": "case-90", "taskID": "67c4c59c-677d-4217-b57c-b1ed57f175b8",
> "taskState": "New"}
> 2024-11-27T12:27:49.535Z INFO cache/context.go:881 app request
> originating pod added {"appID": "case-90", "original task":
> "67c4c59c-677d-4217-b57c-b1ed57f175b8"}
> 2024-11-27T12:27:50.275Z INFO cache/application.go:428
> handle app submission {"app": "applicationID: case-90, queue:
> root.group.hrh.test, partition: default, totalNumOfTasks: 1, currentState:
> Submitted", "clusterID": "mycluster"}
> 2024-11-27T12:27:50.276Z INFO placement/provided_rule.go:109
> Provided rule application placed {"application": "case-90", "queue":
> "root.group.hrh.test"}
> 2024-11-27T12:27:50.276Z INFO scheduler/context.go:549 Added
> application to partition {"applicationID": "case-90", "partitionName":
> "[mycluster]default", "requested queue": "root.group.hrh.test", "placed
> queue": "root.group.hrh.test"}
> 2024-11-27T12:27:50.276Z INFO callback/scheduler_callback.go:118
> Accepting app {"appID": "case-90"}
> 2024-11-27T12:27:51.276Z INFO cache/application.go:526 Skip
> the reservation stage {"appID": "case-90"}
> 2024-11-27T12:27:52.278Z INFO cache/task_state.go:380 Task state
> transition {"app": "case-90", "task":
> "67c4c59c-677d-4217-b57c-b1ed57f175b8", "taskAlias": "test/pod-app-90",
> "source": "New", "destination": "Pending", "event": "InitTask"}
> 2024-11-27T12:27:52.279Z INFO cache/task_state.go:380 Task state
> transition {"app": "case-90", "task":
> "67c4c59c-677d-4217-b57c-b1ed57f175b8", "taskAlias": "test/pod-app-90",
> "source": "Pending", "destination": "Scheduling", "event": "SubmitTask"}
> 2024-11-27T12:27:52.281Z INFO objects/application_state.go:133
> Application state transition {"appID": "case-90", "source": "New",
> "destination": "Accepted", "event": "runApplication"}
> 2024-11-27T12:27:52.281Z INFO objects/application.go:668 ask
> added successfully to application {"appID": "case-90", "ask":
> "67c4c59c-677d-4217-b57c-b1ed57f175b8", "placeholder": false, "pendingDelta":
> "map[memory:31457280 pods:1 vcore:5]"}
> 2024-11-27T12:27:52.288Z INFO objects/application_state.go:133
> Application state transition {"appID": "case-90", "source": "Accepted",
> "destination": "Starting", "event": "runApplication"}
> 2024-11-27T12:27:52.288Z INFO scheduler/partition.go:892
> scheduler allocation processed {"appID": "case-90", "allocationKey":
> "67c4c59c-677d-4217-b57c-b1ed57f175b8", "uuid":
> "40b93e8c-5c34-430e-a754-93859ca7fe0d", "allocatedResource":
> "map[memory:31457280 pods:1 vcore:5]", "placeholder": false, "targetNode":
> "node-a"}
> 2024-11-27T12:27:52.288Z INFO objects/application_state.go:133
> Application state transition {"appID": "case-90", "source": "Starting",
> "destination": "Running", "event": "runApplication"}
> 2024-11-27T12:27:52.288Z INFO cache/task_state.go:380 Task state
> transition {"app": "case-90", "task":
> "67c4c59c-677d-4217-b57c-b1ed57f175b8", "taskAlias": "test/pod-app-90",
> "source": "Scheduling", "destination": "Allocated", "event": "TaskAllocated"}
> 2024-11-27T12:27:52.310Z INFO cache/task_state.go:380 Task state
> transition {"app": "case-90", "task":
> "67c4c59c-677d-4217-b57c-b1ed57f175b8", "taskAlias": "test/pod-app-90",
> "source": "Allocated", "destination": "Bound", "event": "TaskBound"}
> 2024-11-27T12:30:24.701Z INFO cache/task_state.go:380 Task state
> transition {"app": "case-90", "task":
> "67c4c59c-677d-4217-b57c-b1ed57f175b8", "taskAlias": "test/pod-app-90",
> "source": "Bound", "destination": "Completed", "event": "CompleteTask"}
> 2024-11-27T12:30:24.701Z INFO scheduler/partition.go:1249
> removing allocation from application {"appID": "case-90", "allocationId":
> "40b93e8c-5c34-430e-a754-93859ca7fe0d", "terminationType": "STOPPED_BY_RM"}
> 2024-11-27T12:30:24.701Z INFO objects/application_state.go:133
> Application state transition {"appID": "case-90", "source": "Running",
> "destination": "Completing", "event": "completeApplication"}
> 2024-11-27T12:30:54.637Z INFO cache/context.go:871 task added
> {"appID": "case-90", "taskID": "d4b59711-aa03-45cf-a217-dcf6915556e5",
> "taskState": "New"}
> 2024-11-27T12:30:54.702Z INFO objects/application_state.go:133
> Application state transition {"appID": "case-90", "source":
> "Completing", "destination": "Completed", "event": "completeApplication"}
> 2024-11-27T12:30:54.702Z INFO objects/queue.go:749 Application
> completed and removed from queue {"queueName": "root.group.hrh.test",
> "applicationID": "case-90"}
> 2024-11-27T12:30:54.702Z INFO scheduler/partition.go:1418
> Removing terminated application from the application list {"appID":
> "case-90", "app status": "Completed"}
> 2024-11-27T12:30:54.702Z INFO objects/application.go:127
> YK_APP_SUMMARY: {"appID": "case-90", "submissionTime": 1732710470276,
> "startTime": 1732710472288, "finishTime": 1732710654702, "user": "admin",
> "queue": "root.group.hrh.test", "state": "Completed", "rmID": "mycluster",
> "resourceUsage": {"UNKNOWN":{"memory":4781506560,"pods":152,"vcore":760}}}
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]