[jira] [Updated] (YUNIKORN-3025) Support for application-level preemption

Eric Higgins (Jira) Mon, 10 Feb 2025 11:23:07 -0800


     [ 
https://issues.apache.org/jira/browse/YUNIKORN-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Eric Higgins updated YUNIKORN-3025:
-----------------------------------
    Priority: Minor  (was: Major)

> Support for application-level preemption
> ----------------------------------------
>
>                 Key: YUNIKORN-3025
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-3025
>             Project: Apache YuniKorn
>          Issue Type: New Feature
>          Components: core - scheduler
>            Reporter: Eric Higgins
>            Priority: Minor
>
> We would like to use Yunikorn's gang scheduling feature to schedule ML 
> training jobs for different teams. We want to give each team a quota and 
> allow them to borrow resources from other teams' quotas, but have their job 
> preempted if the other team needs to use those resources. However, this seems 
> to not be supported currently, as Yunikorn is missing application-level 
> preemption. It will preempt individual pods until it has freed up enough 
> resources, and those pods may not be from the same application. This is an 
> issue for us because our training jobs are not fault-tolerant and will die if 
> 1 pod gets killed, so we want to preempt an entire application at the same 
> time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (YUNIKORN-3025) Support for application-level preemption

Reply via email to