Choosing a task for the TT to run

Vivek Ratan Fri, 23 May 2008 03:45:07 -0700

I'm curious about some code in JobTracker::getNewTaskForTaskTracker().
This is the method that decides what task to give a free TT.


For each call, the JT first figures out the 'remaining load' per TT for
maps/reduces (which is the total number of map and reduce tasks that
need to be run across all running jobs, divided by the num of TTs). It
then figures out how many maximum map or reduce tasks should be run on
the TT (which is the minimum of the TT's capacity and the 'remaining
load') - call this the 'max load'. Finally, if a TT can run something
(ie, if the # of maps/reduces it is running is less than the 'max
load'), it looks to give it a map task or a reduce task. 

My questions: 

1. I'm wondering why we need to do all this. It's probably to spread the
tasks across slots. For eg., you may want to avoid the case where if
there are 5 TTs, each with 10 free slots, and you have 30 tasks, you
fill up all 10 slots each of the first three TTs, leaving the other two
TTs under-utilized. Is this correct? 

2. Suppose I have two TTs, each with 4 slots for Maps (it's the sme for
reduces - doesn't really matter). Let's suppose each TT is already
running two Maps each, and let's suppose there are 2 maps remaining to
be run. Then it appears that those two remaining Maps cannot be run on
either TT (for each TT, 'numMaps' = 2, and 'maxMapLoad' = 1, so the TT
will not be given a task). The JT will have to wait till the 2 running
Maps on each TT finsih, before it can allocate another Map, so the two
unused slots in each TT go waste. 

3. By balancing out the tasks, don't you run the risks of not fully
utlizing faster nodes? I understand that the max load is calculated at
each iteration, but the whole idea of balancing assumes some homegeneity
among machines, which doesn't seem like a fair assumption. 

It just seems easier (both to code and understand) if you always give a
TT what it asks for. If it has a free Map/Reduce slot, you give it a
Map/Reduce task to run. if grids are typically always running at full
capacity (which seems to be the case a lot of times), then greedily
filling up a slot seems simple and correct - there are always more tasks
to run than slots, and you'll always fill up slots and achieve good
utilization. There doesn't seem to be any need for balancing. 

Or maybe I'm missing the real reason why we need to look at workload
averages.

Choosing a task for the TT to run

Reply via email to