I've wondered about this too -- if there aren't enough tasks for all the tasktrackers, sometimes it might be better to leave some tasktrackers completely idle. These idle tasktrackers might then be reclaimed by some higher-level resource scheduler. There might also be some power consumption implications for idle machines.
At least, it would be nice to have this behavior configurable. -Michael On 5/23/08 3:43 AM, "Vivek Ratan" <[EMAIL PROTECTED]> wrote: > I'm curious about some code in JobTracker::getNewTaskForTaskTracker(). > This is the method that decides what task to give a free TT. > > For each call, the JT first figures out the 'remaining load' per TT for > maps/reduces (which is the total number of map and reduce tasks that > need to be run across all running jobs, divided by the num of TTs). It > then figures out how many maximum map or reduce tasks should be run on > the TT (which is the minimum of the TT's capacity and the 'remaining > load') - call this the 'max load'. Finally, if a TT can run something > (ie, if the # of maps/reduces it is running is less than the 'max > load'), it looks to give it a map task or a reduce task. > > My questions: > > 1. I'm wondering why we need to do all this. It's probably to spread the > tasks across slots. For eg., you may want to avoid the case where if > there are 5 TTs, each with 10 free slots, and you have 30 tasks, you > fill up all 10 slots each of the first three TTs, leaving the other two > TTs under-utilized. Is this correct? > > 2. Suppose I have two TTs, each with 4 slots for Maps (it's the sme for > reduces - doesn't really matter). Let's suppose each TT is already > running two Maps each, and let's suppose there are 2 maps remaining to > be run. Then it appears that those two remaining Maps cannot be run on > either TT (for each TT, 'numMaps' = 2, and 'maxMapLoad' = 1, so the TT > will not be given a task). The JT will have to wait till the 2 running > Maps on each TT finsih, before it can allocate another Map, so the two > unused slots in each TT go waste. > > 3. By balancing out the tasks, don't you run the risks of not fully > utlizing faster nodes? I understand that the max load is calculated at > each iteration, but the whole idea of balancing assumes some homegeneity > among machines, which doesn't seem like a fair assumption. > > It just seems easier (both to code and understand) if you always give a > TT what it asks for. If it has a free Map/Reduce slot, you give it a > Map/Reduce task to run. if grids are typically always running at full > capacity (which seems to be the case a lot of times), then greedily > filling up a slot seems simple and correct - there are always more tasks > to run than slots, and you'll always fill up slots and achieve good > utilization. There doesn't seem to be any need for balancing. > > Or maybe I'm missing the real reason why we need to look at workload > averages.
