[
https://issues.apache.org/jira/browse/YUNIKORN-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882590#comment-17882590
]
Wilfred Spiegelenburg commented on YUNIKORN-2784:
-------------------------------------------------
I think we can get into this situation when the node is full and all pods
running on the node are either daemonset pods already or have a higher priority
than the daemonset pod we are preempting for. That does not seem the case here.
Although it is not much different than described above the reason we cannot
find pods is slightly different.
What I can see is that the pod has a node selector for the node
{{prp-perfsonar-1.ucsc.edu}} defined and we have reserved the node
{{{}prp-perfsonar-1.ucsc.edu{}}}. That is the correct node. The allocation has
the following annotation on it inside YuniKorn:
{code:java}
"yunikorn.apache.org/requiredNode": "prp-perfsonar-1.ucsc.edu" {code}
The question is now why does the node not allow the simple allocation? Tracking
back to the state dump and the node shows that we have not enough resources
available to place the pod. This is the partial node detail from the dump:
{code:java}
"nodeID": "prp-perfsonar-1.ucsc.edu",
"capacity": {
"devices.kubevirt.io/kvm": 1000,
"devices.kubevirt.io/tun": 1000,
"devices.kubevirt.io/vhost-net": 1000,
"ephemeral-storage": 609974506511,
"hugepages-1Gi": 0,
"hugepages-2Mi": 0,
"memory": 16273350656,
"pods": 110,
"smarter-devices/fuse": 20,
"smarter-devices/vfio": 20,
"smarter-devices/vfio_vfio": 20,
"vcore": 16000
},
"allocated": {
"memory": 1073741824,
"pods": 1,
"vcore": 100
},
"occupied": {
"memory": 12673089536,
"pods": 15,
"vcore": 1883
},
"available": {
"devices.kubevirt.io/kvm": 1000,
"devices.kubevirt.io/tun": 1000,
"devices.kubevirt.io/vhost-net": 1000,
"ephemeral-storage": 609974506511,
"hugepages-1Gi": 0,
"hugepages-2Mi": 0,
"memory": 2526519296,
"pods": 94,
"smarter-devices/fuse": 20,
"smarter-devices/vfio": 20,
"smarter-devices/vfio_vfio": 20,
"vcore": 14017
},
{code}
The only other pod that YuniKorn is aware of on that node is another daemonset
pod. That pod has the ID dae0ed3b-2cbd-4286-96b6-e220ffcaacb7. The pod we are
trying to place is requesting 3Gi of memory and there is only 2.5Gi available.
So we resrve the node and try to preempt on that specific node. The other
daemon set pod is filtered out and that leaves us with "nothing" to preempt and
thus stopped.
This is a side effect of running multiple schedulers in the cluster. The node
is occupied with pods placed by the default scheduler. YuniKorn does not see
those pods (yet) as per YUNIKORN-2791. That leaves us in a state that we cannot
find anything to preempt and thus not get the pod up and running.
One of the main reasons not to run multiple schedulers in a cluster.
> Scheduler stuck
> ---------------
>
> Key: YUNIKORN-2784
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2784
> Project: Apache YuniKorn
> Issue Type: Bug
> Reporter: Dmitry
> Priority: Major
> Attachments: Screenshot 2024-08-02 at 1.16.30 PM.png, Screenshot
> 2024-08-02 at 1.20.23 PM.png, dumps.tgz, logs
>
>
> Shortly after switching to yunikorn, a bunch of tiny pods get stuck pending
> (screenshot 1). Also all other ones, but these are the most visible and
> should be running 100%.
> After restarting the scheduler, all get scheduled immediately (screenshot 2).
> Attaching the output of `/ws/v1/stack`, `/ws/v1/fullstatedump` and
> `/debug/pprof/goroutine?debug=2`
> Also logs from the scheduler.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]