Weiwei Yang created YUNIKORN-2270:
-------------------------------------
Summary: GPU Preemption is not triggered as expected when all
available GPUs are used
Key: YUNIKORN-2270
URL: https://issues.apache.org/jira/browse/YUNIKORN-2270
Project: Apache YuniKorn
Issue Type: Bug
Components: core - scheduler
Reporter: Weiwei Yang
I am testing an important scenario of preemption for GPU. The design a scenario
is like the following:
queue structure is pretty simple:
{code}
root.a (min=100, max=300)
root.b (min=0, max=300)
{code}
the cluster has a total of 300 GPUs available, no autoscaling. Reproducing
steps:
1. Create 600 pods in root.b queue, each needs 1 GPU. This will consume all 300
GPUs available in the cluster, and 300 pods pending
2. Create 100 pods in root.a queue, each needs 1 GPU. The expectation is queue
a will preempt 100 GPU from queue b reach the guarantee.
observation: a small number of pods preempted resources from queue b got
started on queue a, the result is not stable. it could not reach guaranteed
resources.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]