I'm curious about some code in JobTracker::getNewTaskForTaskTracker(). This is the method that decides what task to give a free TT.
For each call, the JT first figures out the 'remaining load' per TT for maps/reduces (which is the total number of map and reduce tasks that need to be run across all running jobs, divided by the num of TTs). It then figures out how many maximum map or reduce tasks should be run on the TT (which is the minimum of the TT's capacity and the 'remaining load') - call this the 'max load'. Finally, if a TT can run something (ie, if the # of maps/reduces it is running is less than the 'max load'), it looks to give it a map task or a reduce task. My questions: 1. I'm wondering why we need to do all this. It's probably to spread the tasks across slots. For eg., you may want to avoid the case where if there are 5 TTs, each with 10 free slots, and you have 30 tasks, you fill up all 10 slots each of the first three TTs, leaving the other two TTs under-utilized. Is this correct? 2. Suppose I have two TTs, each with 4 slots for Maps (it's the sme for reduces - doesn't really matter). Let's suppose each TT is already running two Maps each, and let's suppose there are 2 maps remaining to be run. Then it appears that those two remaining Maps cannot be run on either TT (for each TT, 'numMaps' = 2, and 'maxMapLoad' = 1, so the TT will not be given a task). The JT will have to wait till the 2 running Maps on each TT finsih, before it can allocate another Map, so the two unused slots in each TT go waste. 3. By balancing out the tasks, don't you run the risks of not fully utlizing faster nodes? I understand that the max load is calculated at each iteration, but the whole idea of balancing assumes some homegeneity among machines, which doesn't seem like a fair assumption. It just seems easier (both to code and understand) if you always give a TT what it asks for. If it has a free Map/Reduce slot, you give it a Map/Reduce task to run. if grids are typically always running at full capacity (which seems to be the case a lot of times), then greedily filling up a slot seems simple and correct - there are always more tasks to run than slots, and you'll always fill up slots and achieve good utilization. There doesn't seem to be any need for balancing. Or maybe I'm missing the real reason why we need to look at workload averages.
