[
https://issues.apache.org/jira/browse/AURORA-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15677106#comment-15677106
]
Mehrdad Nurolahzade commented on AURORA-1802:
---------------------------------------------
According to benchmarks, in-memory database engines like H2 do not yield the
same performance improvements when it comes to batching.
This benchmark reported 20% improvement, so ultimately if there is any gain
here from batching it's not going to be substantial:
[http://java-persistence-performance.blogspot.com/2013/05/batch-writing-and-dynamic-vs.html]
> AttributeAggregate slows down scheduling of jobs with many instances
> --------------------------------------------------------------------
>
> Key: AURORA-1802
> URL: https://issues.apache.org/jira/browse/AURORA-1802
> Project: Aurora
> Issue Type: Bug
> Components: Scheduler
> Reporter: Stephan Erb
> Fix For: 0.17.0
>
>
> The current implementation of
> [{{AttributeAggregate}}|https://github.com/apache/aurora/blob/f559e930659e25b3d7cacb7b845ebda50d18d66a/src/main/java/org/apache/aurora/scheduler/filter/AttributeAggregate.java]
> slows down scheduling of jobs with many instances. Interestingly, this is
> currently not visible in our job scheduling benchmark results as it only
> affects the benchmark setup time but not the measured part.
> {{AttributeAggregate}} relies on {{Suppliers.memoize}} to ensure that it is
> only computed once and only when necessary. This has probably been done
> because the factory
> [{{AttributeAggregate.getJobActiveState}}|https://github.com/apache/aurora/blob/f559e930659e25b3d7cacb7b845ebda50d18d66a/src/main/java/org/apache/aurora/scheduler/filter/AttributeAggregate.java#L56-L91]
> is slow.
> After some recent changes to schedule multiple task instances per scheduling
> round the aggregate is computed in each scheduling round via the call
> [{{resourceRequest.getJobState().updateAttributeAggregate(...)}}
> |https://github.com/apache/aurora/blob/f559e930659e25b3d7cacb7b845ebda50d18d66a/src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java#L173]
> in {{TaskAssigner}}. This means the expensive factory is called once per
> scheduling round.
> h3. Potential improvements
> * the current factory implementation performs one {{fetchTasks}} query
> followed by {{n}} distinct {{getHostAttributes}} queries. This could be
> reduced to a single SQL query.
> * the aggregate makes heavy use of {{ImmutableMultiset}} even though it is
> not immutable any more. There is potential room for improvement here.
> * The aggregate uses suppliers to perform a lazy instantiation even though
> its current usage is not lazy any more. We can either make the implementation
> eager, or ensure that the expensive part is only run when absolutely
> necessary.
> h3. Proof of concept
> * 4 mins 23.407 secs -- total runtime of {{./gradlew jmh
> -Pbenchmarks='SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark'}}
> * 2 mins 40.308 secs -- total runtime of {{./gradlew jmh
> -Pbenchmarks='SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark'}}
> with [{{resourceRequest.getJobState().updateAttributeAggregate(...)}}
> |https://github.com/apache/aurora/blob/f559e930659e25b3d7cacb7b845ebda50d18d66a/src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java#L173]
> commented out. This works as the call is not necessary when only a single
> instance is scheduled per scheduling round, as done in the benchmarks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)