[
https://issues.apache.org/jira/browse/YUNIKORN-2645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17850562#comment-17850562
]
Wilfred Spiegelenburg commented on YUNIKORN-2645:
-------------------------------------------------
The side effect of that broken node is that every single pod that we allocate
will select that broken node. Based on the node sorting that node stays as the
first node in the list to try. Every single pod gets placed but then fails to
start. The node usage does not change and thus the node does not get pushed
back in the list of available nodes. The scheduler due to that does not make
any real progress.
I would consider that a hung scheduler but there is nothing that I think we can
do about that without some major changes.
A possible solution would be for instance rate limit the number of pods we put
on a node. Never schedule more than 10 pods per second on a node, including or
ignoring failures, and when that is hit we skip the node. That could have made
sure we try a couple of times and then try the next node. That could cause a
slight delay when a cluster is almost full. It will also delay somewhat in an
auto scaling cluster as the scheduler skips a node while the auto scaler does
not...
> parent queue exceeds maximum resource
> -------------------------------------
>
> Key: YUNIKORN-2645
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2645
> Project: Apache YuniKorn
> Issue Type: Bug
> Components: core - scheduler
> Affects Versions: 1.5.1
> Reporter: Dmitry
> Priority: Major
> Attachments: yunikorn-logs.txt.gz
>
>
> We had a node broken in the cluster - kubernetes was creating pods which were
> immediately failing with "OutOfGPU" state. The node had 1000+ pods on it.
> The scheduler panicked with the log attached and was not scheduling any other
> pods.
> The config:
> {code:yaml}
> apiVersion: v1
> data:
> admissionController.filtering.bypassNamespaces:
> ^kube-system$,^rook$,^rook-east$,^rook-central$,^rook-pacific$,^rook-south-east$,^rook-system$
> queues.yaml: |
> partitions:
> - name: default
> placementrules:
> - name: fixed
> value: root.scavenging.osg
> create: true
> filter:
> type: allow
> users:
> - system:serviceaccount:osg-ligo:prp-htcondor-provisioner
> -
> system:serviceaccount:osg-opportunistic:prp-htcondor-provisioner
> - system:serviceaccount:osg-icecube:prp-htcondor-provisioner
> - name: tag
> value: namespace
> create: true
> parent:
> name: tag
> value: namespace.parentqueue
> - name: tag
> value: namespace
> create: true
> parent:
> name: fixed
> value: general
> nodesortpolicy:
> type: fair
> resourceweights:
> vcore: 1.0
> memory: 1.0
> nvidia.com/gpu: 4.0
> queues:
> - name: root
> submitacl: '*'
> properties:
> application.sort.policy: fair
> queues:
> - name: system
> parent: true
> properties:
> preemption.policy: disabled
> - name: general
> parent: true
> childtemplate:
> properties:
> application.sort.policy: fair
> resources:
> guaranteed:
> vcore: 100
> memory: 1Ti
> nvidia.com/gpu: 8
> max:
> vcore: 4000
> memory: 15Ti
> nvidia.com/gpu: 200
> - name: scavenging
> parent: true
> childtemplate:
> resources:
> guaranteed:
> vcore: 1
> memory: 1G
> nvidia.com/gpu: 1
> properties:
> priority.offset: "-10"
> - name: interactive
> parent: true
> childtemplate:
> resources:
> guaranteed:
> vcore: 1000
> memory: 10T
> nvidia.com/gpu: 48
> nvidia.com/a100: 4
> properties:
> priority.offset: "10"
> preemption.policy: disabled
> - name: clemson
> parent: true
> properties:
> application.sort.policy: fair
> resources:
> guaranteed:
> vcore: 256
> memory: 2T
> nvidia.com/gpu: 24
> - name: nysernet
> parent: true
> properties:
> application.sort.policy: fair
> resources:
> guaranteed:
> vcore: 1000
> memory: 5T
> nvidia.com/gpu: 16
> - name: gpn
> parent: true
> properties:
> application.sort.policy: fair
> resources:
> guaranteed:
> vcore: 5000
> memory: 50T
> nvidia.com/gpu: 256
> nvidia.com/a100: 16
> - name: sdsu
> parent: true
> properties:
> application.sort.policy: fair
> resources:
> guaranteed:
> vcore: 1000
> memory: 15T
> nvidia.com/gpu: 112
> nvidia.com/a100: 64
> queues:
> - name: sdsu-jupyterhub
> parent: false
> properties:
> preemption.policy: disabled
> priority.offset: "10"
> resources:
> guaranteed:
> vcore: 700
> memory: 5T
> nvidia.com/gpu: 100
> - name: tide
> parent: true
> properties:
> application.sort.policy: fair
> resources:
> guaranteed:
> vcore: 592
> memory: 15T
> nvidia.com/gpu: 72
> queues:
> - name: rook-tide
> parent: false
> properties:
> preemption.policy: disabled
> priority.offset: "10"
> resources:
> guaranteed:
> vcore: 500
> memory: 1T
> - name: ucsc
> parent: true
> properties:
> application.sort.policy: fair
> resources:
> guaranteed:
> vcore: 500
> memory: 4T
> nvidia.com/gpu: 256
> - name: ucsd
> parent: true
> properties:
> application.sort.policy: fair
> resources:
> guaranteed:
> vcore: 40000
> memory: 40T
> nvidia.com/gpu: 512
> nvidia.com/a100: 100
> queues:
> - name: ry
> parent: true
> properties:
> application.sort.policy: fair
> resources:
> guaranteed:
> vcore: 512
> memory: 8T
> nvidia.com/gpu: 144
> - name: suncave
> parent: false
> properties:
> preemption.policy: disabled
> priority.offset: "10"
> resources:
> guaranteed:
> vcore: 1000
> memory: 1T
> - name: dimm
> parent: false
> properties:
> preemption.policy: disabled
> priority.offset: "1000"
> resources:
> guaranteed:
> vcore: 1000
> memory: 1T
> - name: haosu
> parent: true
> properties:
> application.sort.policy: fair
> resources:
> guaranteed:
> vcore: 5000
> memory: 10T
> nvidia.com/gpu: 120
> queues:
> - name: rook-haosu
> parent: false
> properties:
> preemption.policy: disabled
> priority.offset: "10"
> resources:
> guaranteed:
> vcore: 1000
> memory: 1T
> kind: ConfigMap
> metadata:
> creationTimestamp: "2023-12-21T06:09:12Z"
> name: yunikorn-configs
> namespace: yunikorn
> resourceVersion: "7764804169"
> uid: 5b9b2c04-57af-4cab-84f8-b5f018952f9c
> {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]