Hi Allen,

Recent versions of the fair scheduler have configurations for "delay
scheduling" - essentially, it will wait for a few seconds when a slot opens
up to try to find a local task before assigning a non-local one. This is
specifically to avoid the issue you're describing.

Check out Matei's Eurosys 2010 paper here:

http://www.cs.berkeley.edu/~matei/papers/2010/eurosys_delay_scheduling.pdf

I believe this got lumped in with MAPREDUCE-706.

Thanks
-Todd

On Fri, May 28, 2010 at 11:37 AM, Allen Wittenauer <[email protected]
> wrote:

>
>        I've been thinking (which is always a dangerous thing) about data
> locality lately.
>
>        If we look at file systems, there is this idea of 'reserved space'.
>  This space is used for a variety of reasons, including to reduce
> fragmentation on busy file systems.  This allows the file system driver to
> make smarter decisions of block placement and helping the overall
> throughput.
>
>        At LinkedIn, we're about to build a new grid with a few hundred
> nodes.  I'm beginning to wonder if it wouldn't make sense to actually 'hold
> back' some task slots from usage with this same concept in mind.  Let's take
> a grid that is full:  all of the task slots are in use.  When a task ends,
> the scheduler has to make a decision as to which task gets used for any
> available task slots.  If we assume a fairly FIFO view of the world (default
> scheduler, capacity, maybe fair share?), it pulls the next task off the
> stack and pushes it to the task slot.  If only one task slot is free,
> locality doesn't enter into the picture at all.  In essence, we've
> fragmented our execution.
>
>        If we were to leave even 1 slot 'always' free (and therefore
> sacrificing execution speed by 1 slot), the scheduler could potentially make
> sure the task is host or rack local.  If it can't, no loss--it wouldn't have
> been local anyway.  Obviously reserving more slots as 'always' free
> increases our likelihood of being local.  It just comes down to how much of
> a tradeoff it is worth.
>
>        I guess the real question comes down to how much of an impact does
> data locality really have.  I know in the case of the bigger grids at
> Yahoo!, the ops team suspected (but never did the homework to verify) that
> our grids and their usage so massive that the data locality rarely happened,
> especially for "popular" data.  I can't help but wonder if the situation
> would have been better if we would have kept x% (say .005%?) of the grid
> free based upon the speculation above.
>
>        Thoughts?




-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to