[
https://issues.apache.org/jira/browse/MESOS-3157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15425080#comment-15425080
]
Jacob Janco commented on MESOS-3157:
------------------------------------
Some interesting output from the benchmark listed in the reviews:
Sample output without 51027:
[ RUN ]
SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
Using 10000 agents and 3000 frameworks
Added 3000 frameworks in 57251us
Added 10000 agents in 3.21345353333333mins
allocator settled after 1.61236038333333mins
[ OK ]
SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
(290578 ms)
Sample output with 51027:
[ RUN ]
SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
Using 10000 agents and 3000 frameworks
Added 3000 frameworks in 39817us
Added 10000 agents in 3.22860541666667mins
allocator settled after 25.525654secs
[ OK ]
SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.FrameworkFailover/22
(220137 ms)
> only perform batch resource allocations
> ---------------------------------------
>
> Key: MESOS-3157
> URL: https://issues.apache.org/jira/browse/MESOS-3157
> Project: Mesos
> Issue Type: Bug
> Components: allocation
> Reporter: James Peach
> Assignee: Jacob Janco
>
> Our deployment environments have a lot of churn, with many short-live
> frameworks that often revive offers. Running the allocator takes a long time
> (from seconds up to minutes).
> In this situation, event-triggered allocation causes the event queue in the
> allocator process to get very long, and the allocator effectively becomes
> unresponsive (eg. a revive offers message takes too long to come to the head
> of the queue).
> We have been running a patch to remove all the event-triggered allocations
> and only allocate from the batch task
> {{HierarchicalAllocatorProcess::batch}}. This works great and really improves
> responsiveness.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)