[
https://issues.apache.org/jira/browse/YUNIKORN-3128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18024314#comment-18024314
]
Peter Bacsko commented on YUNIKORN-3128:
----------------------------------------
[~ruiwenzhao] the problem is that the error occurs a bit too late in the
scheduling workflow. At this point, the scheduler-core already registered the
allocation and shim is executing the final steps (volume/pod binding). If
there's any error during these steps, there's no retry. Since it's relatively
rare, we've received very few (if any) complaints during the years. But it's
definitely something that we need take care of.
> Yunikorn ignores pending pods after apiserver errors
> ----------------------------------------------------
>
> Key: YUNIKORN-3128
> URL: https://issues.apache.org/jira/browse/YUNIKORN-3128
> Project: Apache YuniKorn
> Issue Type: Bug
> Components: core - scheduler
> Affects Versions: 1.7.0
> Environment: EKS 1.31
> Reporter: Ruiwen Zhao
> Priority: Major
>
> We are running some load testing with Yunikorn, where pods are created at
> 200/s and we monitor if Yunikorn can schedule them at the same rate.
>
> One issue we saw is that Yunikorn ends up ignoring a bunch of (~2000) pods at
> the end of the load test, and completes the application. As shown below,
> there are many pods still Pending, but Yunikorn completes the application
> they belong to, and therefore those pods are stuck. All the pods has
> "schedulerName: yunikorn".
>
> {code:java}
> ❯ kc get pods -n spark8s-kube-burner-yunikorn | grep Pending | head
> kube-burner-0-0-82077 0/1 Pending 0 16m
> kube-burner-0-0-82105 0/1 Pending 0 16m
> kube-burner-0-0-82129 0/1 Pending 0 16m
> kube-burner-0-0-82132 0/1 Pending 0 16m
> kube-burner-0-0-82140 0/1 Pending 0 16m
> kube-burner-0-0-82141 0/1 Pending 0 16m
> 2025-09-29T18:28:18.866Z INFO core.scheduler.fsm
> objects/application_state.go:147 Application state transition {"appID":
> "yunikorn-spark8s-kube-burner-yunikorn-0", "source": "Completing",
> "destination": "Completed", "event": "completeApplication"} {code}
> When looking at one of the Pending pods (kube-burner-0-0-82077), we can
> Yunikorn was trying to schedule it, but failed to do so because of the etcd
> errors. Yunikorn retried once, failed again, and then Yunikorn submitted the
> task again, but no log after that:
> {code:java}
> 2025-09-30T21:18:45.248Z INFO shim.fsm cache/task_state.go:381 Task state
> transition {"app": "yunikorn-spark8s-kube-burner-yunikorn-0", "task":
> "731dc815-9ee0-4767-a5a9-939219b94f6e", "taskAlias":
> "spark8s-kube-burner-yunikorn/kube-burner-0-0-82077", "source": "New",
> "destination": "Pending", "event": "InitTask"}
> 2025-09-30T21:18:45.260Z INFO shim.fsm cache/task_state.go:381 Task state
> transition {"app": "yunikorn-spark8s-kube-burner-yunikorn-0", "task":
> "731dc815-9ee0-4767-a5a9-939219b94f6e", "taskAlias":
> "spark8s-kube-burner-yunikorn/kube-burner-0-0-82077", "source": "Pending",
> "destination": "Scheduling", "event": "SubmitTask"}
> 2025-09-30T21:18:59.464Z ERROR shim.client client/kubeclient.go:127 failed to
> bind pod {"namespace": "spark8s-kube-burner-yunikorn", "podName":
> "kube-burner-0-0-82077", "error": "Operation cannot be fulfilled on
> pods/binding \"kube-burner-0-0-82077\": etcdserver: request timed out"}
> 2025-09-30T21:18:59.465Z ERROR shim.cache.task cache/task.go:464 task failed
> {"appID": "yunikorn-spark8s-kube-burner-yunikorn-0", "taskID":
> "731dc815-9ee0-4767-a5a9-939219b94f6e", "reason": "bind pod to node failed,
> name: spark8s-kube-burner-yunikorn/kube-burner-0-0-82077, Operation cannot be
> fulfilled on pods/binding \"kube-burner-0-0-82077\": etcdserver: request
> timed out"}
> 2025-09-30T21:18:59.465Z INFO shim.fsm cache/task_state.go:381 Task state
> transition {"app": "yunikorn-spark8s-kube-burner-yunikorn-0", "task":
> "731dc815-9ee0-4767-a5a9-939219b94f6e", "taskAlias":
> "spark8s-kube-burner-yunikorn/kube-burner-0-0-82077", "source": "Allocated",
> "destination": "Failed", "event": "TaskFail"}
> 2025-09-30T21:18:59.464Z ERROR shim.cache.task cache/task.go:388 bind pod to
> node failed {"taskID": "731dc815-9ee0-4767-a5a9-939219b94f6e", "error":
> "Operation cannot be fulfilled on pods/binding \"kube-burner-0-0-82077\":
> etcdserver: request timed out"}
> 2025-09-30T21:19:25.464Z INFO shim.fsm cache/task_state.go:381 Task state
> transition {"app": "yunikorn-spark8s-kube-burner-yunikorn-0", "task":
> "731dc815-9ee0-4767-a5a9-939219b94f6e", "taskAlias":
> "spark8s-kube-burner-yunikorn/kube-burner-0-0-82077", "source": "New",
> "destination": "Pending", "event": "InitTask"}
> 2025-09-30T21:19:25.464Z INFO shim.fsm cache/task_state.go:381 Task state
> transition {"app": "yunikorn-spark8s-kube-burner-yunikorn-0", "task":
> "731dc815-9ee0-4767-a5a9-939219b94f6e", "taskAlias":
> "spark8s-kube-burner-yunikorn/kube-burner-0-0-82077", "source": "Pending",
> "destination": "Scheduling", "event": "SubmitTask"} {code}
> The failure seems to be caused by etcd timeout, which makes sense, but IMO
> the expected behavior is that Yunikorn keeps trying to schedule the pods with
> backoff.
>
> Yunkorn version: 1.7.0
> Env: EKS 1.31
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]