[
https://issues.apache.org/jira/browse/YUNIKORN-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17106362#comment-17106362
]
Wilfred Spiegelenburg commented on YUNIKORN-99:
-----------------------------------------------
PR opened with the base changes for the state aware scheduling of applications.
Two new states added to the app: waiting and starting. An app is now moving
from a new state to an accepted state when the first ask is added to the
application. As soon as that ask is allocated the app moves from accepted to
starting. From starting the app moves to running. An app can stay in the
starting state for a maximum of 5 minutes or it moves before that if more
allocations are added to the application. The rest of the time the app will
spend in the running state.
It can leave the running state and move to waiting if there are no outstanding
asks and no allocations for that app. That means the app is not done but there
is nothing scheduled for the app. If a new ask gets added the app moves back to
running and gets scheduled as normal.
Applications can be killed or marked as completed by the RM if it can determine
the state. The scheduler does not move the app to those state itself.
The state aware policy for applications in a queue leverages the new state to
sort the applications. The logic for sorting applications in a queue is as
follows:
- only apps with pending resources are scheduled
- apps are sorted based on submission time, oldest app first
- all running applications are candidates
- a maximum of one app in the starting state will be added to the list of
running apps
- if the queue contains no (0) starting apps, with or without pending
resources, the oldest app in the accepted state will be added
The queue will then use that list of apps to schedule.
On recovery apps that have existing allocations are considered to be in a
running state.
> Enhanced FIFO scheduling for batch workloads
> --------------------------------------------
>
> Key: YUNIKORN-99
> URL: https://issues.apache.org/jira/browse/YUNIKORN-99
> Project: Apache YuniKorn
> Issue Type: New Feature
> Components: core - scheduler
> Reporter: Weiwei Yang
> Assignee: Wilfred Spiegelenburg
> Priority: Major
> Labels: pull-request-available
>
> An enhanced version of FIFO scheduling for batch workloads
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]