Reducer granularity and starvation

W.P. McNeill Wed, 18 May 2011 14:25:53 -0700

I'm working on a cluster with 360 reducer slots.  I've got a big job, so
when I launch it I follow the recommendations in the Hadoop documentation
and set mapred.reduce.tasks=350, i.e. slightly less than the available
number of slots.


The problem is that my reducers can still take a long time (2-4 hours) to
run.  So I end up grabbing a big slab of reducers and starving everybody
else out.  I've got my priority set to VERY_LOW and
mapred.reduce.slowstart.completed.maps to 0.9, so I think I've done
everything I can do on the job parameters front.  Currently there isn't a
way to make the individual reducers run faster, so I'm trying to figure out
the best way to run my job so that it plays nice with other users of the
cluster.  My rule of thumb has always been to not try and do any scheduling
myself, but let Hadoop handle it for me, but I don't think that works in
this scenario.

Questions:

   1. Am I correct in thinking that long reducer times just mess up Hadoop's
   scheduling granularity to a degree that it can't handle?  Is 4-hour reducer
   outside the normal operating range of Hadoop?
   2. Is there any way to stagger task launches?  (Aside from manually.)
   3. What if I set mapred.reduce.tasks to be some value much, much larger
   than the number of available reducer slots, like 100,000.  Will that make
   the amount of work sent to each reducer smaller (hence increasing the
   scheduler granularity) or will it have no effect?
   4. In this scenario, do I just have to reconcile myself to the fact that
   my job is going to squat on a block of reducers no matter what and
set mapred.reduce.tasks
   to something much less than the available number of slots?

Reducer granularity and starvation

Reply via email to