[jira] [Commented] (YUNIKORN-2784) Scheduler stuck

Dmitry (Jira) Wed, 18 Sep 2024 19:30:47 -0700


    [ 
https://issues.apache.org/jira/browse/YUNIKORN-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882851#comment-17882851
 ]


Dmitry commented on YUNIKORN-2784:
----------------------------------

{code:yaml}
apiVersion: v1
kind: Pod
metadata:
  annotations:
    cni.projectcalico.org/containerID: 
c61c6a5aeff96ad39881f401d5a0653c4b0d4c3e14465db912550816250820b3
    cni.projectcalico.org/podIP: 10.244.163.83/32
    cni.projectcalico.org/podIPs: 10.244.163.83/32
  creationTimestamp: "2024-08-13T01:29:15Z"
  generateName: ipmi-mon-
  labels:
    controller-revision-hash: 655f5857c9
    k8s-app: ipmi-mon
    pod-template-generation: "16"
  name: ipmi-mon-2xg4x
  namespace: ipmi
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: DaemonSet
    name: ipmi-mon
    uid: 4d8318b1-d969-11e8-ac6e-0cc47a6be994
  resourceVersion: "8423120849"
  uid: 145c4ea6-fa2f-453f-af1c-0a6f4ff57ff7
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchFields:
          - key: metadata.name
            operator: In
            values:
            - k8s-bf2-01.sdsc.optiputer.net
  containers:
  - image: lovoo/ipmi_exporter
    imagePullPolicy: IfNotPresent
    name: ipmi-mon
    resources: {}
    securityContext:
      privileged: true
      procMount: Default
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-9j7t4
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: k8s-bf2-01.sdsc.optiputer.net
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
{code}


> Scheduler stuck
> ---------------
>
>                 Key: YUNIKORN-2784
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2784
>             Project: Apache YuniKorn
>          Issue Type: Bug
>            Reporter: Dmitry
>            Priority: Major
>         Attachments: Screenshot 2024-08-02 at 1.16.30 PM.png, Screenshot 
> 2024-08-02 at 1.20.23 PM.png, Screenshot 2024-09-18 at 7.26.17 PM.png, 
> dumps.tgz, logs
>
>
> Shortly after switching to yunikorn, a bunch of tiny pods get stuck pending 
> (screenshot 1). Also all other ones, but these are the most visible and 
> should be running 100%.
> After restarting the scheduler, all get scheduled immediately (screenshot 2).
> Attaching the output of `/ws/v1/stack`, `/ws/v1/fullstatedump` and 
> `/debug/pprof/goroutine?debug=2`
> Also logs from the scheduler.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YUNIKORN-2784) Scheduler stuck

Reply via email to