> On Feb. 22, 2013, 1:28 a.m., Ben Mahler wrote: > > Higher level comment: > > > > It's unfortunate that, with this change, the allocator now has _3_ > > collections based on the slave id. > > > > Can you consider using one collection that keeps an appropriate struct for > > the slave information? > > This would avoid the need to keep this new hashset, the lookup would be > > transformed into checking a field on the slave information struct.
I agree that it makes sense to combine the 'slaves' and 'allocatable' hashmaps (which contains SlaveInfos and currently free resources, respectively), especially since there's going to be more slave-specific things we can store in the struct when static allocations get introduced. However, getting rid of the 'available' hashset somewhat defeats the point of this patch, since we would be going back to having every allocation go through every slave. Allocations would still be a little faster since we would presumably be checking a flag in the slave struct instead of having to actually perform the sufficient resource/whitelisted check for every slave, but I would be more comfortable just keeping the 'available' hashset, especially since the case where we need allocations to be fastest, a big cluster where there's enough frameworks running on the cluster to keep most of the slaves busy most of the time, is precisely when the 'available' hashset will be smallest and therefore when keeping 'available' around will have the greatest impact on running time. > On Feb. 22, 2013, 1:28 a.m., Ben Mahler wrote: > > src/master/hierarchical_allocator_process.hpp, line 703 > > <https://reviews.apache.org/r/6665/diff/5/?file=259092#file259092line703> > > > > why the change here? A slave needs to be in both slaveIds (which specifies which slaves we want to consider for this allocation) and available (which specifies which slaves have free resources to allocate) for us to allocate from it, so we iterate through one and check for membership in the other. There are two possible ways that an allocation can happen: either 'allocate()' gets called, in which case slaveIds and available are the same and it doesn't matter which one we iterate through and which one we check against, or 'allocate(slaveId)' gets called, for example when a new slave is added, in which case slaveIds only contains one slave while available may be arbitrarily long, in which case we want to iterate through slaveIds and check for membership in available, so that's how we do it now. > On Feb. 22, 2013, 1:28 a.m., Ben Mahler wrote: > > src/master/hierarchical_allocator_process.hpp, line 145 > > <https://reviews.apache.org/r/6665/diff/5/?file=259092#file259092line145> > > > > s/offerable/sufficient > > > > I think the semantics you're capturing with this function are: Are the > > resources sufficient? > > > > if (sufficient(resources)) reads intuitively to me I made it isSufficient to be consistent with the way we've named other predicates. - Thomas ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6665/#review16911 ----------------------------------------------------------- On Feb. 21, 2013, 2:10 a.m., Thomas Marshall wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/6665/ > ----------------------------------------------------------- > > (Updated Feb. 21, 2013, 2:10 a.m.) > > > Review request for mesos and Benjamin Hindman. > > > Description > ------- > > Currently, every time we do an allocation we have to traverse the entire list > of active slaves an check each one to see if its whitelisted and if it has > resources to allocate. This patch keeps a set of all slaves that meet those > requirements, and updates it when slaves are added/removed and when resources > are allocated/recovered. > > Timing comparisons, using test_framework on a local cluster on my laptop > (each number is an average over 10 tests): > > 100 slaves, 100 tasks > without patch: 5.82s > with patch: 5.773s > improvement of about 1% > > 1000 slaves, 100 tasks > without patch: 8.261s > with patch: 8.07s > improvement of about 2% > > Since this was a scalability issue, you'd presumably see a bigger improvement > in larger clusters but its difficult to simulate more than 1000 slaves or so > locally. If more convincing numbers are needed I can do some testing on EC2 > (if nothing else, I'm hoping we'll have a Jenkins build with automated > performance tests set up later this semester, so I can test this in the > process of setting that up), but at the very least it seems clear that the > patch improves the runtime complexity of the allocation algorithm and doesn't > slow allocations down even in small clusters. > > > Diffs > ----- > > src/master/hierarchical_allocator_process.hpp 33e059c > > Diff: https://reviews.apache.org/r/6665/diff/ > > > Testing > ------- > > make check > > > Thanks, > > Thomas Marshall > >
