Hi Allen, Recent versions of the fair scheduler have configurations for "delay scheduling" - essentially, it will wait for a few seconds when a slot opens up to try to find a local task before assigning a non-local one. This is specifically to avoid the issue you're describing.
Check out Matei's Eurosys 2010 paper here: http://www.cs.berkeley.edu/~matei/papers/2010/eurosys_delay_scheduling.pdf I believe this got lumped in with MAPREDUCE-706. Thanks -Todd On Fri, May 28, 2010 at 11:37 AM, Allen Wittenauer <[email protected] > wrote: > > I've been thinking (which is always a dangerous thing) about data > locality lately. > > If we look at file systems, there is this idea of 'reserved space'. > This space is used for a variety of reasons, including to reduce > fragmentation on busy file systems. This allows the file system driver to > make smarter decisions of block placement and helping the overall > throughput. > > At LinkedIn, we're about to build a new grid with a few hundred > nodes. I'm beginning to wonder if it wouldn't make sense to actually 'hold > back' some task slots from usage with this same concept in mind. Let's take > a grid that is full: all of the task slots are in use. When a task ends, > the scheduler has to make a decision as to which task gets used for any > available task slots. If we assume a fairly FIFO view of the world (default > scheduler, capacity, maybe fair share?), it pulls the next task off the > stack and pushes it to the task slot. If only one task slot is free, > locality doesn't enter into the picture at all. In essence, we've > fragmented our execution. > > If we were to leave even 1 slot 'always' free (and therefore > sacrificing execution speed by 1 slot), the scheduler could potentially make > sure the task is host or rack local. If it can't, no loss--it wouldn't have > been local anyway. Obviously reserving more slots as 'always' free > increases our likelihood of being local. It just comes down to how much of > a tradeoff it is worth. > > I guess the real question comes down to how much of an impact does > data locality really have. I know in the case of the bigger grids at > Yahoo!, the ops team suspected (but never did the homework to verify) that > our grids and their usage so massive that the data locality rarely happened, > especially for "popular" data. I can't help but wonder if the situation > would have been better if we would have kept x% (say .005%?) of the grid > free based upon the speculation above. > > Thoughts? -- Todd Lipcon Software Engineer, Cloudera
