Re: Review Request: Speed up allocations by keeping a set of slaves with available resources

Thomas Marshall Mon, 11 Mar 2013 15:02:22 -0700


> On March 6, 2013, 7:23 p.m., Ben Mahler wrote:
> > src/master/hierarchical_allocator_process.hpp, line 529
> > <https://reviews.apache.org/r/6665/diff/8/?file=264106#file264106line529>
> >
> >     Looks to me like allocatable can just be a function on the slave struct:
> >     
> >     bool allocatable() const { return whitelisted && 
> > isSufficient(available); }
> >     
> >     Seems cleaner, no? That way we only have to update the whitelisted bit 
> > here and elsewhere.
> >     
> >     This loop would then become:
> >     
> >     foreachkey (const SlaveID& slaveId, slaves) {
> >       slave[slaveId].whitelisted = isWhitelisted(slaveId);
> >     }


I implemented this the way that you suggested because it seems reasonable, but 
I just wanted to note that this means that the title of this review is now 
meaningless and this patch only accomplishes restructuring the allocator to 
have clearer code and does not do anything about speeding allocations up, since 
we are back to checking whitelisting and resource minimums on all slaves for 
every allocation now (again, as per your suggestions).

I'm assuming that your relative lack of concern about efficiency issues means 
that Twitter isn't experiencing any problems with the allocator running too 
slowly, so I'm perfectly fine with this.


> On March 6, 2013, 7:23 p.m., Ben Mahler wrote:
> > src/master/hierarchical_allocator_process.hpp, line 452
> > <https://reviews.apache.org/r/6665/diff/8/?file=264106#file264106line452>
> >
> >     Inline?
> >     
> >     slaves[slaveId] = Slave(slaveInfo);
> >     
> >     Also, maybe you want to add whitelisted as an optional constructor 
> > parameter?

The problem with doing something like 'slaves[slaveId] = Slave(slaveInfo, 
isWhitelisted(slaveId))' is that you're calling isWhitelisted before slaveId is 
in slaves, meaning that the 'CHECK(slaves.contains(slaveId)' in isWhitelisted 
will fail and 'slaves[slaveId].hostname()' won't return anything. We could 
modify isWhitelisted to take the hostname as the parameter instead of a 
slaveId, but given that we're also using isWhitelisted in updateWhitelist, I 
would rather leave isWhitelisted the way it is.


> On March 6, 2013, 7:23 p.m., Ben Mahler wrote:
> > src/master/hierarchical_allocator_process.hpp, line 792
> > <https://reviews.apache.org/r/6665/diff/8/?file=264106#file264106line792>
> >
> >     const&?

This causes a seg fault because slaves[slaveId].available is updated below.


- Thomas


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6665/#review17482
-----------------------------------------------------------


On March 11, 2013, 9:59 p.m., Thomas Marshall wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6665/
> -----------------------------------------------------------
> 
> (Updated March 11, 2013, 9:59 p.m.)
> 
> 
> Review request for mesos and Benjamin Hindman.
> 
> 
> Description
> -------
> 
> Currently, every time we do an allocation we have to traverse the entire list 
> of active slaves an check each one to see if its whitelisted and if it has 
> resources to allocate. This patch keeps a set of all slaves that meet those 
> requirements, and updates it when slaves are added/removed and when resources 
> are allocated/recovered.
> 
> Timing comparisons, using test_framework on a local cluster on my laptop 
> (each number is an average over 10 tests):
> 
> 100 slaves, 100 tasks
> without patch: 5.82s
> with patch: 5.773s
> improvement of about 1%
> 
> 1000 slaves, 100 tasks
> without patch: 8.261s
> with patch: 8.07s
> improvement of about 2%
> 
> Since this was a scalability issue, you'd presumably see a bigger improvement 
> in larger clusters but its difficult to simulate more than 1000 slaves or so 
> locally. If more convincing numbers are needed I can do some testing on EC2 
> (if nothing else, I'm hoping we'll have a Jenkins build with automated 
> performance tests set up later this semester, so I can test this in the 
> process of setting that up), but at the very least it seems clear that the 
> patch improves the runtime complexity of the allocation algorithm and doesn't 
> slow allocations down even in small clusters. 
> 
> 
> Diffs
> -----
> 
>   src/local/local.hpp 2633d25 
>   src/local/local.cpp 3402029 
>   src/master/allocator.hpp b68b67d 
>   src/master/drf_sorter.hpp 79566cc 
>   src/master/drf_sorter.cpp fe31a00 
>   src/master/hierarchical_allocator_process.hpp c1d6f54 
>   src/master/main.cpp ca0abec 
>   src/master/master.hpp c9b4b3f 
>   src/master/master.cpp 814a6e1 
>   src/master/sorter.hpp 73db6b1 
>   src/tests/allocator_tests.cpp b953cd1 
>   src/tests/allocator_zookeeper_tests.cpp 3f70202 
>   src/tests/fault_tolerance_tests.cpp 61509f1 
>   src/tests/gc_tests.cpp fbdd6d6 
>   src/tests/master_detector_tests.cpp 787ba19 
>   src/tests/master_tests.cpp 2ba14fc 
>   src/tests/resource_offers_tests.cpp 44eaf0d 
>   src/tests/sorter_tests.cpp 61e6038 
>   src/tests/utils.hpp d3efa58 
>   third_party/libprocess/third_party/stout/include/stout/stringify.hpp 
> 136316d 
> 
> Diff: https://reviews.apache.org/r/6665/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Thomas Marshall
> 
>

Re: Review Request: Speed up allocations by keeping a set of slaves with available resources

Reply via email to