> On March 6, 2013, 7:23 p.m., Ben Mahler wrote:
> > src/master/hierarchical_allocator_process.hpp, line 529
> > <https://reviews.apache.org/r/6665/diff/8/?file=264106#file264106line529>
> >
> >     Looks to me like allocatable can just be a function on the slave struct:
> >     
> >     bool allocatable() const { return whitelisted && 
> > isSufficient(available); }
> >     
> >     Seems cleaner, no? That way we only have to update the whitelisted bit 
> > here and elsewhere.
> >     
> >     This loop would then become:
> >     
> >     foreachkey (const SlaveID& slaveId, slaves) {
> >       slave[slaveId].whitelisted = isWhitelisted(slaveId);
> >     }
> 
> Thomas Marshall wrote:
>     I implemented this the way that you suggested because it seems 
> reasonable, but I just wanted to note that this means that the title of this 
> review is now meaningless and this patch only accomplishes restructuring the 
> allocator to have clearer code and does not do anything about speeding 
> allocations up, since we are back to checking whitelisting and resource 
> minimums on all slaves for every allocation now (again, as per your 
> suggestions).
>     
>     I'm assuming that your relative lack of concern about efficiency issues 
> means that Twitter isn't experiencing any problems with the allocator running 
> too slowly, so I'm perfectly fine with this.

Great! I don't quite get your comment here. We do indeed now check resource 
minimums but "checking whitelisting" is *only* checking the newly added 
whitelisted bit, rather than the hashset, right? (are you referring to line 
699?)


> On March 6, 2013, 7:23 p.m., Ben Mahler wrote:
> > src/master/hierarchical_allocator_process.hpp, line 452
> > <https://reviews.apache.org/r/6665/diff/8/?file=264106#file264106line452>
> >
> >     Inline?
> >     
> >     slaves[slaveId] = Slave(slaveInfo);
> >     
> >     Also, maybe you want to add whitelisted as an optional constructor 
> > parameter?
> 
> Thomas Marshall wrote:
>     The problem with doing something like 'slaves[slaveId] = Slave(slaveInfo, 
> isWhitelisted(slaveId))' is that you're calling isWhitelisted before slaveId 
> is in slaves, meaning that the 'CHECK(slaves.contains(slaveId)' in 
> isWhitelisted will fail and 'slaves[slaveId].hostname()' won't return 
> anything. We could modify isWhitelisted to take the hostname as the parameter 
> instead of a slaveId, but given that we're also using isWhitelisted in 
> updateWhitelist, I would rather leave isWhitelisted the way it is.

Got it, sounds like a bad abstraction, maybe we should fix that in a later 
review!


- Ben


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6665/#review17482
-----------------------------------------------------------


On March 11, 2013, 9:59 p.m., Thomas Marshall wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6665/
> -----------------------------------------------------------
> 
> (Updated March 11, 2013, 9:59 p.m.)
> 
> 
> Review request for mesos and Benjamin Hindman.
> 
> 
> Description
> -------
> 
> Currently, every time we do an allocation we have to traverse the entire list 
> of active slaves an check each one to see if its whitelisted and if it has 
> resources to allocate. This patch keeps a set of all slaves that meet those 
> requirements, and updates it when slaves are added/removed and when resources 
> are allocated/recovered.
> 
> Timing comparisons, using test_framework on a local cluster on my laptop 
> (each number is an average over 10 tests):
> 
> 100 slaves, 100 tasks
> without patch: 5.82s
> with patch: 5.773s
> improvement of about 1%
> 
> 1000 slaves, 100 tasks
> without patch: 8.261s
> with patch: 8.07s
> improvement of about 2%
> 
> Since this was a scalability issue, you'd presumably see a bigger improvement 
> in larger clusters but its difficult to simulate more than 1000 slaves or so 
> locally. If more convincing numbers are needed I can do some testing on EC2 
> (if nothing else, I'm hoping we'll have a Jenkins build with automated 
> performance tests set up later this semester, so I can test this in the 
> process of setting that up), but at the very least it seems clear that the 
> patch improves the runtime complexity of the allocation algorithm and doesn't 
> slow allocations down even in small clusters. 
> 
> 
> Diffs
> -----
> 
>   src/local/local.hpp 2633d25 
>   src/local/local.cpp 3402029 
>   src/master/allocator.hpp b68b67d 
>   src/master/drf_sorter.hpp 79566cc 
>   src/master/drf_sorter.cpp fe31a00 
>   src/master/hierarchical_allocator_process.hpp c1d6f54 
>   src/master/main.cpp ca0abec 
>   src/master/master.hpp c9b4b3f 
>   src/master/master.cpp 814a6e1 
>   src/master/sorter.hpp 73db6b1 
>   src/tests/allocator_tests.cpp b953cd1 
>   src/tests/allocator_zookeeper_tests.cpp 3f70202 
>   src/tests/fault_tolerance_tests.cpp 61509f1 
>   src/tests/gc_tests.cpp fbdd6d6 
>   src/tests/master_detector_tests.cpp 787ba19 
>   src/tests/master_tests.cpp 2ba14fc 
>   src/tests/resource_offers_tests.cpp 44eaf0d 
>   src/tests/sorter_tests.cpp 61e6038 
>   src/tests/utils.hpp d3efa58 
>   third_party/libprocess/third_party/stout/include/stout/stringify.hpp 
> 136316d 
> 
> Diff: https://reviews.apache.org/r/6665/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Thomas Marshall
> 
>

Reply via email to