Bill Farner created AURORA-909:
----------------------------------

             Summary: Make task scheduling more efficient
                 Key: AURORA-909
                 URL: https://issues.apache.org/jira/browse/AURORA-909
             Project: Aurora
          Issue Type: Story
          Components: Scheduler
            Reporter: Bill Farner


We're making a decent effort at reducing the _cost_ of task scheduling 
operations, abut have not yet invested in reducing the working set in a way 
that causes task scheduling to scale better.  Each scheduling attempt for each 
task is an O(n) operation, where n is the number of offers.

I would like to explore optimizations where we try to reduce the amount of 
redundant work performed in task scheduling.  Say, for example, we're trying to 
schedule a task that needs 2 CPUs, and we only have offers with 1 CPU.  Each 
scheduling round will re-assess every offer, despite the fact that the offers 
have not changed shape, and will always be a mismatch (hereafter termed 
_static_ mismatches).  Instead, we should try to skip over offers that are a 
static mismatch.  We could do this at the {{TaskGroup}} level, since every 
element in a task group is by definition statically equivalent.  This means 
that jobs with a large number of instances could be scheduled very efficiently, 
since the first task scheduling round could identify static mismatches, 
reducing the working set in the next round.

This is to contrast with _dynamic_ mismatches, where a change in the tasks on a 
machine or other settings could make a previously-ineligible offer become a 
match.  The current sources of dynamic mismatches are limit constraints, host 
maintenance modes, and dedicated attributes.

I propose we proceed in several steps, re-evaluating after each:
1. instrument the scheduler to better estimate the improvements
2. avoid future (offer, task group) evaluations when static mismatches are found
3. avoid future (offer, task group) evaluations when dynamic mismatches are 
found



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to