[
https://issues.apache.org/jira/browse/YUNIKORN-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882851#comment-17882851
]
Dmitry commented on YUNIKORN-2784:
----------------------------------
{code:yaml}
apiVersion: v1
kind: Pod
metadata:
annotations:
cni.projectcalico.org/containerID:
c61c6a5aeff96ad39881f401d5a0653c4b0d4c3e14465db912550816250820b3
cni.projectcalico.org/podIP: 10.244.163.83/32
cni.projectcalico.org/podIPs: 10.244.163.83/32
creationTimestamp: "2024-08-13T01:29:15Z"
generateName: ipmi-mon-
labels:
controller-revision-hash: 655f5857c9
k8s-app: ipmi-mon
pod-template-generation: "16"
name: ipmi-mon-2xg4x
namespace: ipmi
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: DaemonSet
name: ipmi-mon
uid: 4d8318b1-d969-11e8-ac6e-0cc47a6be994
resourceVersion: "8423120849"
uid: 145c4ea6-fa2f-453f-af1c-0a6f4ff57ff7
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchFields:
- key: metadata.name
operator: In
values:
- k8s-bf2-01.sdsc.optiputer.net
containers:
- image: lovoo/ipmi_exporter
imagePullPolicy: IfNotPresent
name: ipmi-mon
resources: {}
securityContext:
privileged: true
procMount: Default
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-9j7t4
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeName: k8s-bf2-01.sdsc.optiputer.net
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
{code}
> Scheduler stuck
> ---------------
>
> Key: YUNIKORN-2784
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2784
> Project: Apache YuniKorn
> Issue Type: Bug
> Reporter: Dmitry
> Priority: Major
> Attachments: Screenshot 2024-08-02 at 1.16.30 PM.png, Screenshot
> 2024-08-02 at 1.20.23 PM.png, Screenshot 2024-09-18 at 7.26.17 PM.png,
> dumps.tgz, logs
>
>
> Shortly after switching to yunikorn, a bunch of tiny pods get stuck pending
> (screenshot 1). Also all other ones, but these are the most visible and
> should be running 100%.
> After restarting the scheduler, all get scheduled immediately (screenshot 2).
> Attaching the output of `/ws/v1/stack`, `/ws/v1/fullstatedump` and
> `/debug/pprof/goroutine?debug=2`
> Also logs from the scheduler.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]