Stephan Erb created AURORA-1802:
-----------------------------------
Summary: AttributeAggregate slows down scheduling of jobs with
many instances
Key: AURORA-1802
URL: https://issues.apache.org/jira/browse/AURORA-1802
Project: Aurora
Issue Type: Bug
Components: Scheduler
Reporter: Stephan Erb
The current implementation of
[{{AttributeAggregate}}|https://github.com/apache/aurora/blob/f559e930659e25b3d7cacb7b845ebda50d18d66a/src/main/java/org/apache/aurora/scheduler/filter/AttributeAggregate.java]
slows down scheduling of jobs with many instances. Interestingly, this is
currently not visible in our job scheduling benchmark results as it only
affects the benchmark setup time but not the measured part.
{{AttributeAggregate}} relies on {{Suppliers.memoize}} to ensure that it is
only computed once and only when necessary. This has probably been done because
the factory
[{{AttributeAggregate.getJobActiveState}}|https://github.com/apache/aurora/blob/f559e930659e25b3d7cacb7b845ebda50d18d66a/src/main/java/org/apache/aurora/scheduler/filter/AttributeAggregate.java#L56-L91]
is slow.
After some recent changes to schedule multiple task instances per scheduling
round the aggregate is computed in each scheduling round via the call
[{{resourceRequest.getJobState().updateAttributeAggregate(...)}}
|https://github.com/apache/aurora/blob/f559e930659e25b3d7cacb7b845ebda50d18d66a/src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java#L173]
in {{TaskAssigner}}. This means the expensive factory is called once per
scheduling round.
h3. Potential improvements
* the current factory implementation performs one {{fetchTasks}} query followed
by {{n}} distinct {{getHostAttributes}} queries. This could be reduced to a
single SQL query.
* the aggregate makes heavy use of {{ImmutableMultiset}} even though it is not
immutable any more. There is potential room for improvement here.
* The aggregate uses suppliers to perform a lazy instantiation even though its
current usage is not lazy any more. We can either make the implementation
eager, or ensure that the expensive part is only run when absolutely necessary.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)