[
https://issues.apache.org/jira/browse/YUNIKORN-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17871541#comment-17871541
]
Wilfred Spiegelenburg commented on YUNIKORN-2678:
-------------------------------------------------
The current calculation is broken and I have gone through the slack discussion.
I can see why we would want to use the max resource as a substitute for
guaranteed resource. Looking forward to a PR.
One point I would already make is that the max used should only rely on the
configured values in the hierarchy. The current cluster size must not be taken
into account. So root maximum must be ignored when we look at this. Besides
that looking at the {{internalGetMax()}} code there is a bug there for which I
will file a jira. That will most likely influence this sorting as it revolves
around setting 0 values.
> Yunikorn does not appear to be considering Guaranteed resources when
> allocating Pending Pods.
> ---------------------------------------------------------------------------------------------
>
> Key: YUNIKORN-2678
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2678
> Project: Apache YuniKorn
> Issue Type: Bug
> Components: core - scheduler
> Affects Versions: 1.5.1
> Environment: EKS 1.29
> Reporter: Paul Santa Clara
> Assignee: Paul Santa Clara
> Priority: Major
> Attachments: Screenshot 2024-08-06 at 5.18.18 PM.png, Screenshot
> 2024-08-06 at 5.18.21 PM.png, Screenshot 2024-08-06 at 5.18.30 PM.png,
> jira-queues.yaml, jira-tier0-screenshot.png, jira-tier1-screenshot.png,
> jira-tier2-screenshot.png, jira-tier3-screenshot.png
>
>
> Please see the attached queue configuration(jira-queues.yaml).
> I will create 100 pods in Tier0, 100 pods in Tier1, 100 pods in Tier2 and 100
> pods in Tier3. Each Pod will require 1 VCore. Initially, there will be 0
> suitable nodes to run the Pods and all will be Pending. Karpenter will soon
> provision Nodes and Yunikorn will react by binding the Pods.
> Given this
> [code|https://github.com/apache/yunikorn-core/blob/a786feb5761be28e802d08976d224c40639cd86b/pkg/scheduler/objects/sorters.go#L81C74-L81C95],
> I would expect Yunikorn to distribute the allocations such that each of the
> Tier’ed queues reaches its Guarantees. Instead, I observed a roughly even
> distribution of allocation across all of the queues.
> Tier0 fails to meet its Gaurantees while Tier3, for instance, dramatically
> overshoots them.
>
> {code:java}
> > kubectl get pods -n finance | grep tier-0 | grep Pending | wc -l
> 86
> > kubectl get pods -n finance | grep tier-1 | grep Pending | wc -l
> 83
> > kubectl get pods -n finance | grep tier-2 | grep Pending | wc -l
> 78
> > kubectl get pods -n finance | grep tier-3 | grep Pending | wc -l
> 77
> {code}
> Please see attached screen shots for queue usage.
> Note, this situation can also be reproduced without the use of Karpenter by
> simply setting Yunikorn's `service.schedulingInterval` to a high duration,
> say 1m. Doing so will force Yunikorn to react to 400 Pods -across 4 queues-
> at roughly the same time forcing prioritization of queue allocations.
> Test code to generate Pods:
> {code:java}
> from kubernetes import client, config
> config.load_kube_config()
> v1 = client.CoreV1Api()
> def create_pod_manifest(tier, exec,):
> pod_manifest = {
> 'apiVersion': 'v1',
> 'kind': 'Pod',
> 'metadata': {
> 'name': f"rolling-test-tier-{tier}-exec-{exec}",
> 'namespace': 'finance',
> 'labels': {
> 'applicationId': f"MyOwnApplicationId-tier-{tier}",
> 'queue': f"root.tiers.{tier}"
> },
> "yunikorn.apache.org/user.info":
> '{"user":"system:serviceaccount:finance:spark","groups":["system:serviceaccounts","system:serviceaccounts:finance","system:authenticated"]}'
> },
> 'spec': {
> "affinity": {
> "nodeAffinity" : {
> "requiredDuringSchedulingIgnoredDuringExecution" : {
> "nodeSelectorTerms" : [
> {
> "matchExpressions" : [
> {
> "key" : "di.rbx.com/dedicated",
> "operator" : "In",
> "values" : ["spark"]
> }
> ]
> }
> ]
> }
> },
> },
> "tolerations" : [
> {
> "effect" : "NoSchedule",
> "key": "dedicated",
> "operator" : "Equal",
> "value" : "spark"
> },
> ],
> "schedulerName": "yunikorn",
> 'restartPolicy': 'Always',
> 'containers': [{
> "name": "ubuntu",
> 'image': 'ubuntu',
> "command": ["sleep", "604800"],
> "imagePullPolicy": "IfNotPresent",
> "resources" : {
> "limits" : {
> 'cpu' : "1"
> },
> "requests" : {
> 'cpu' : "1"
> }
> }
> }]
> }
> }
> return pod_manifest
> for i in range(0,4):
> tier = str(i)
> for j in range(0,100):
> exec = str(j)
> pod_manifest = create_pod_manifest(tier, exec)
> print(pod_manifest)
> api_response = v1.create_namespaced_pod(body=pod_manifest,
> namespace="finance")
> print(f"creating tier( {tier} ) exec( {exec} )")
> {code}
>
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]