[
https://issues.apache.org/jira/browse/YUNIKORN-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eric Higgins updated YUNIKORN-3025:
-----------------------------------
Priority: Minor (was: Major)
> Support for application-level preemption
> ----------------------------------------
>
> Key: YUNIKORN-3025
> URL: https://issues.apache.org/jira/browse/YUNIKORN-3025
> Project: Apache YuniKorn
> Issue Type: New Feature
> Components: core - scheduler
> Reporter: Eric Higgins
> Priority: Minor
>
> We would like to use Yunikorn's gang scheduling feature to schedule ML
> training jobs for different teams. We want to give each team a quota and
> allow them to borrow resources from other teams' quotas, but have their job
> preempted if the other team needs to use those resources. However, this seems
> to not be supported currently, as Yunikorn is missing application-level
> preemption. It will preempt individual pods until it has freed up enough
> resources, and those pods may not be from the same application. This is an
> issue for us because our training jobs are not fault-tolerant and will die if
> 1 pod gets killed, so we want to preempt an entire application at the same
> time.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]