[jira] [Commented] (YUNIKORN-2784) Scheduler stuck

Wilfred Spiegelenburg (Jira) Wed, 18 Sep 2024 00:01:14 -0700


    [ 
https://issues.apache.org/jira/browse/YUNIKORN-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882590#comment-17882590
 ]


Wilfred Spiegelenburg commented on YUNIKORN-2784:
-------------------------------------------------

I think we can get into this situation when the node is full and all pods 
running on the node are either daemonset pods already or have a higher priority 
than the daemonset pod we are preempting for. That does not seem the case here.

Although it is not much different than described above the reason we cannot 
find pods is slightly different.

What I can see is that the pod has a node selector for the node 
{{prp-perfsonar-1.ucsc.edu}} defined and we have reserved the node 
{{{}prp-perfsonar-1.ucsc.edu{}}}. That is the correct node. The allocation has 
the following annotation on it inside YuniKorn:
{code:java}
"yunikorn.apache.org/requiredNode": "prp-perfsonar-1.ucsc.edu" {code}
The question is now why does the node not allow the simple allocation? Tracking 
back to the state dump and the node shows that we have not enough resources 
available to place the pod. This is the partial node detail from the dump:
{code:java}
          "nodeID": "prp-perfsonar-1.ucsc.edu",
          "capacity": {
            "devices.kubevirt.io/kvm": 1000,
            "devices.kubevirt.io/tun": 1000,
            "devices.kubevirt.io/vhost-net": 1000,
            "ephemeral-storage": 609974506511,
            "hugepages-1Gi": 0,
            "hugepages-2Mi": 0,
            "memory": 16273350656,
            "pods": 110,
            "smarter-devices/fuse": 20,
            "smarter-devices/vfio": 20,
            "smarter-devices/vfio_vfio": 20,
            "vcore": 16000
          },
          "allocated": {
            "memory": 1073741824,
            "pods": 1,
            "vcore": 100
          },
          "occupied": {
            "memory": 12673089536,
            "pods": 15,
            "vcore": 1883
          },
          "available": {
            "devices.kubevirt.io/kvm": 1000,
            "devices.kubevirt.io/tun": 1000,
            "devices.kubevirt.io/vhost-net": 1000,
            "ephemeral-storage": 609974506511,
            "hugepages-1Gi": 0,
            "hugepages-2Mi": 0,
            "memory": 2526519296,
            "pods": 94,
            "smarter-devices/fuse": 20,
            "smarter-devices/vfio": 20,
            "smarter-devices/vfio_vfio": 20,
            "vcore": 14017
          },
{code}
The only other pod that YuniKorn is aware of on that node is another daemonset 
pod. That pod has the ID dae0ed3b-2cbd-4286-96b6-e220ffcaacb7. The pod we are 
trying to place is requesting 3Gi of memory and there is only 2.5Gi available. 
So we resrve the node and try to preempt on that specific node. The other 
daemon set pod is filtered out and that leaves us with "nothing" to preempt and 
thus stopped.

This is a side effect of running multiple schedulers in the cluster. The node 
is occupied with pods placed by the default scheduler. YuniKorn does not see 
those pods (yet) as per YUNIKORN-2791. That leaves us in a state that we cannot 
find anything to preempt and thus not get the pod up and running.

One of the main reasons not to run multiple schedulers in a cluster.

> Scheduler stuck
> ---------------
>
>                 Key: YUNIKORN-2784
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2784
>             Project: Apache YuniKorn
>          Issue Type: Bug
>            Reporter: Dmitry
>            Priority: Major
>         Attachments: Screenshot 2024-08-02 at 1.16.30 PM.png, Screenshot 
> 2024-08-02 at 1.20.23 PM.png, dumps.tgz, logs
>
>
> Shortly after switching to yunikorn, a bunch of tiny pods get stuck pending 
> (screenshot 1). Also all other ones, but these are the most visible and 
> should be running 100%.
> After restarting the scheduler, all get scheduled immediately (screenshot 2).
> Attaching the output of `/ws/v1/stack`, `/ws/v1/fullstatedump` and 
> `/debug/pprof/goroutine?debug=2`
> Also logs from the scheduler.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YUNIKORN-2784) Scheduler stuck

Reply via email to