Eric Higgins created YUNIKORN-3025:
--------------------------------------

             Summary: Support for application-level preemption
                 Key: YUNIKORN-3025
                 URL: https://issues.apache.org/jira/browse/YUNIKORN-3025
             Project: Apache YuniKorn
          Issue Type: New Feature
          Components: core - scheduler
            Reporter: Eric Higgins


We would like to use Yunikorn's gang scheduling feature to schedule ML training 
jobs for different teams. We want to give each team a quota and allow them to 
borrow resources from other teams' quotas, but have their job preempted if the 
other team needs to use those resources. However, this seems to not be 
supported currently, as Yunikorn is missing application-level preemption. It 
will preempt individual pods until it has freed up enough resources, and those 
pods may not be from the same application. This is an issue for us because our 
training jobs are not fault-tolerant and will die if 1 pod gets killed, so we 
want to preempt an entire application at the same time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to