Re: Review Request: Speed up allocations by keeping a set of slaves with available resources

Thomas Marshall Mon, 25 Feb 2013 11:40:14 -0800


> On Feb. 22, 2013, 1:28 a.m., Ben Mahler wrote:
> > Higher level comment:
> > 
> > It's unfortunate that, with this change, the allocator now has _3_ 
> > collections based on the slave id.
> > 
> > Can you consider using one collection that keeps an appropriate struct for 
> > the slave information?
> > This would avoid the need to keep this new hashset, the lookup would be 
> > transformed into checking a field on the slave information struct.

I agree that it makes sense to combine the 'slaves' and 'allocatable' hashmaps 
(which contains SlaveInfos and currently free resources, respectively), 
especially since there's going to be more slave-specific things we can store in 
the struct when static allocations get introduced.

However, getting rid of the 'available' hashset somewhat defeats the point of 
this patch, since we would be going back to having every allocation go through 
every slave. Allocations would still be a little faster since we would 
presumably be checking a flag in the slave struct instead of having to actually 
perform the sufficient resource/whitelisted check for every slave, but I would 
be more comfortable just keeping the 'available' hashset, especially since the 
case where we need allocations to be fastest, a big cluster where there's 
enough frameworks running on the cluster to keep most of the slaves busy most 
of the time, is precisely when the 'available' hashset will be smallest and 
therefore when keeping 'available' around will have the greatest impact on 
running time.

> On Feb. 22, 2013, 1:28 a.m., Ben Mahler wrote:
> > src/master/hierarchical_allocator_process.hpp, line 703
> > <https://reviews.apache.org/r/6665/diff/5/?file=259092#file259092line703>
> >
> >     why the change here?

A slave needs to be in both slaveIds (which specifies which slaves we want to 
consider for this allocation) and available (which specifies which slaves have 
free resources to allocate) for us to allocate from it, so we iterate through 
one and check for membership in the other. There are two possible ways that an 
allocation can happen: either 'allocate()' gets called, in which case slaveIds 
and available are the same and it doesn't matter which one we iterate through 
and which one we check against, or 'allocate(slaveId)' gets called, for example 
when a new slave is added, in which case slaveIds only contains one slave while 
available may be arbitrarily long, in which case we want to iterate through 
slaveIds and check for membership in available, so that's how we do it now.

> On Feb. 22, 2013, 1:28 a.m., Ben Mahler wrote:
> > src/master/hierarchical_allocator_process.hpp, line 145
> > <https://reviews.apache.org/r/6665/diff/5/?file=259092#file259092line145>
> >
> >     s/offerable/sufficient
> >     
> >     I think the semantics you're capturing with this function are: Are the 
> > resources sufficient?
> >     
> >     if (sufficient(resources)) reads intuitively to me

I made it isSufficient to be consistent with the way we've named other 
predicates.

- Thomas

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6665/#review16911
-----------------------------------------------------------

On Feb. 21, 2013, 2:10 a.m., Thomas Marshall wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6665/
> -----------------------------------------------------------
> 
> (Updated Feb. 21, 2013, 2:10 a.m.)
> 
> 
> Review request for mesos and Benjamin Hindman.
> 
> 
> Description
> -------
> 
> Currently, every time we do an allocation we have to traverse the entire list 
> of active slaves an check each one to see if its whitelisted and if it has 
> resources to allocate. This patch keeps a set of all slaves that meet those 
> requirements, and updates it when slaves are added/removed and when resources 
> are allocated/recovered.
> 
> Timing comparisons, using test_framework on a local cluster on my laptop 
> (each number is an average over 10 tests):
> 
> 100 slaves, 100 tasks
> without patch: 5.82s
> with patch: 5.773s
> improvement of about 1%
> 
> 1000 slaves, 100 tasks
> without patch: 8.261s
> with patch: 8.07s
> improvement of about 2%
> 
> Since this was a scalability issue, you'd presumably see a bigger improvement 
> in larger clusters but its difficult to simulate more than 1000 slaves or so 
> locally. If more convincing numbers are needed I can do some testing on EC2 
> (if nothing else, I'm hoping we'll have a Jenkins build with automated 
> performance tests set up later this semester, so I can test this in the 
> process of setting that up), but at the very least it seems clear that the 
> patch improves the runtime complexity of the allocation algorithm and doesn't 
> slow allocations down even in small clusters. 
> 
> 
> Diffs
> -----
> 
>   src/master/hierarchical_allocator_process.hpp 33e059c 
> 
> Diff: https://reviews.apache.org/r/6665/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Thomas Marshall
> 
>

Re: Review Request: Speed up allocations by keeping a set of slaves with available resources

Reply via email to