[jira] [Commented] (AURORA-909) Make task scheduling more efficient

Stephan Erb (JIRA) Fri, 26 Dec 2014 03:44:14 -0800

    [ 
https://issues.apache.org/jira/browse/AURORA-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14259044#comment-14259044
 ]


Stephan Erb commented on AURORA-909:
------------------------------------

Regarding the preemptor, there seem to be a few low hanging fruits to optimize 
its efficiency:

* Loop invariant attributestore access can be moved out of the loop over all 
victims: 
https://github.com/apache/incubator-aurora/blob/c1b078d56fba7690b5212416fce40c156014cafc/src/main/java/org/apache/aurora/scheduler/async/preemptor/PreemptorImpl.java#L252
* The current sum computing the total available resources on a slave after 
preemption is in order of O(n² ) for n preemptable tasks of a slave. This can 
be improved by incrementing the sum as we go, instead of recomputing it per 
iteration: 
https://github.com/apache/incubator-aurora/blob/c1b078d56fba7690b5212416fce40c156014cafc/src/main/java/org/apache/aurora/scheduler/async/preemptor/PreemptorImpl.java#L248
 
* By prefixing the victim loop with a single check whether the preempting all 
preemptable tasks would be sufficient to schedule our pending tasks, we only 
have to pay the O(n ) schedulingFilter computations when they will be 
sufficient. In all other cases (e.g., when value constraints missmatch), we 
only have to perform a single check.

> Make task scheduling more efficient
> -----------------------------------
>
>                 Key: AURORA-909
>                 URL: https://issues.apache.org/jira/browse/AURORA-909
>             Project: Aurora
>          Issue Type: Story
>          Components: Scheduler
>            Reporter: Bill Farner
>            Assignee: Maxim Khutornenko
>
> We're making a decent effort at reducing the _cost_ of task scheduling 
> operations, abut have not yet invested in reducing the working set in a way 
> that causes task scheduling to scale better.  Each scheduling attempt for 
> each task is an O(n) operation, where n is the number of offers.
> I would like to explore optimizations where we try to reduce the amount of 
> redundant work performed in task scheduling.  Say, for example, we're trying 
> to schedule a task that needs 2 CPUs, and we only have offers with 1 CPU.  
> Each scheduling round will re-assess every offer, despite the fact that the 
> offers have not changed shape, and will always be a mismatch (hereafter 
> termed _static_ mismatches).  Instead, we should try to skip over offers that 
> are a static mismatch.  We could do this at the {{TaskGroup}} level, since 
> every element in a task group is by definition statically equivalent.  This 
> means that jobs with a large number of instances could be scheduled very 
> efficiently, since the first task scheduling round could identify static 
> mismatches, reducing the working set in the next round.
> This is to contrast with _dynamic_ mismatches, where a change in the tasks on 
> a machine or other settings could make a previously-ineligible offer become a 
> match.  The current sources of dynamic mismatches are limit constraints, host 
> maintenance modes, and dedicated attributes.
> I propose we proceed in several steps, re-evaluating after each:
> 1. instrument the scheduler to better estimate the improvements
> 2. avoid future (offer, task group) evaluations when static mismatches are 
> found
> 3. avoid future (offer, task group) evaluations when dynamic mismatches are 
> found



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (AURORA-909) Make task scheduling more efficient

Reply via email to