[
https://issues.apache.org/jira/browse/YUNIKORN-2270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Manikandan R resolved YUNIKORN-2270.
------------------------------------
Fix Version/s: 1.5.0
Resolution: Fixed
Merged to master
> GPU Preemption is not triggered as expected when all available GPUs are used
> ----------------------------------------------------------------------------
>
> Key: YUNIKORN-2270
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2270
> Project: Apache YuniKorn
> Issue Type: Bug
> Components: core - scheduler
> Reporter: Weiwei Yang
> Assignee: Weiwei Yang
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.5.0
>
>
> I am testing an important scenario of preemption for GPU. The design a
> scenario is like the following:
> queue structure is pretty simple:
> {code}
> root.a (min=100, max=300)
> root.b (min=0, max=300)
> {code}
> the cluster has a total of 300 GPUs available, no autoscaling. Reproducing
> steps:
> 1. Create 600 pods in root.b queue, each needs 1 GPU. This will consume all
> 300 GPUs available in the cluster, and 300 pods pending
> 2. Create 100 pods in root.a queue, each needs 1 GPU. The expectation is
> queue a will preempt 100 GPU from queue b reach the guarantee.
> observation: a small number of pods preempted resources from queue b got
> started on queue a, the result is not stable. it could not reach guaranteed
> resources.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]