Weiwei Yang created YUNIKORN-2270:
-------------------------------------

             Summary: GPU Preemption is not triggered as expected when all 
available GPUs are used
                 Key: YUNIKORN-2270
                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2270
             Project: Apache YuniKorn
          Issue Type: Bug
          Components: core - scheduler
            Reporter: Weiwei Yang


I am testing an important scenario of preemption for GPU. The design a scenario 
is like the following:

queue structure is pretty simple:

{code}
root.a (min=100, max=300)
root.b (min=0, max=300)
{code}

the cluster has a total of 300 GPUs available, no autoscaling. Reproducing 
steps:

1. Create 600 pods in root.b queue, each needs 1 GPU. This will consume all 300 
GPUs available in the cluster, and 300 pods pending
2. Create 100 pods in root.a queue, each needs 1 GPU. The expectation is queue 
a will preempt 100 GPU from queue b reach the guarantee. 

observation: a small number of pods preempted resources from queue b got 
started on queue a, the result is not stable. it could not reach guaranteed 
resources. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to