[
https://issues.apache.org/jira/browse/AURORA-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14259044#comment-14259044
]
Stephan Erb commented on AURORA-909:
------------------------------------
Regarding the preemptor, there seem to be a few low hanging fruits to optimize
its efficiency:
* Loop invariant attributestore access can be moved out of the loop over all
victims:
https://github.com/apache/incubator-aurora/blob/c1b078d56fba7690b5212416fce40c156014cafc/src/main/java/org/apache/aurora/scheduler/async/preemptor/PreemptorImpl.java#L252
* The current sum computing the total available resources on a slave after
preemption is in order of O(n² ) for n preemptable tasks of a slave. This can
be improved by incrementing the sum as we go, instead of recomputing it per
iteration:
https://github.com/apache/incubator-aurora/blob/c1b078d56fba7690b5212416fce40c156014cafc/src/main/java/org/apache/aurora/scheduler/async/preemptor/PreemptorImpl.java#L248
* By prefixing the victim loop with a single check whether the preempting all
preemptable tasks would be sufficient to schedule our pending tasks, we only
have to pay the O(n ) schedulingFilter computations when they will be
sufficient. In all other cases (e.g., when value constraints missmatch), we
only have to perform a single check.
> Make task scheduling more efficient
> -----------------------------------
>
> Key: AURORA-909
> URL: https://issues.apache.org/jira/browse/AURORA-909
> Project: Aurora
> Issue Type: Story
> Components: Scheduler
> Reporter: Bill Farner
> Assignee: Maxim Khutornenko
>
> We're making a decent effort at reducing the _cost_ of task scheduling
> operations, abut have not yet invested in reducing the working set in a way
> that causes task scheduling to scale better. Each scheduling attempt for
> each task is an O(n) operation, where n is the number of offers.
> I would like to explore optimizations where we try to reduce the amount of
> redundant work performed in task scheduling. Say, for example, we're trying
> to schedule a task that needs 2 CPUs, and we only have offers with 1 CPU.
> Each scheduling round will re-assess every offer, despite the fact that the
> offers have not changed shape, and will always be a mismatch (hereafter
> termed _static_ mismatches). Instead, we should try to skip over offers that
> are a static mismatch. We could do this at the {{TaskGroup}} level, since
> every element in a task group is by definition statically equivalent. This
> means that jobs with a large number of instances could be scheduled very
> efficiently, since the first task scheduling round could identify static
> mismatches, reducing the working set in the next round.
> This is to contrast with _dynamic_ mismatches, where a change in the tasks on
> a machine or other settings could make a previously-ineligible offer become a
> match. The current sources of dynamic mismatches are limit constraints, host
> maintenance modes, and dedicated attributes.
> I propose we proceed in several steps, re-evaluating after each:
> 1. instrument the scheduler to better estimate the improvements
> 2. avoid future (offer, task group) evaluations when static mismatches are
> found
> 3. avoid future (offer, task group) evaluations when dynamic mismatches are
> found
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)