Subramanian, Yes, and for that reason the scheduler is pluggable. See Capacity Scheduler and Fair Scheduler descriptions, as they implement something similar (instead of the default scheduler, which is purely FIFO and hence this behavior).
Fair Share Scheduler: http://hadoop.apache.org/common/docs/stable/fair_scheduler.html Capacity Scheduler: http://hadoop.apache.org/common/docs/stable/capacity_scheduler.html On Mon, Jun 25, 2012 at 12:24 PM, Subramanian Ganapathy <subramanian.ganapath...@gmail.com> wrote: > Hi, > > While reading the book "HADOOP: a definitive guide", 6th chapter{ How does > MapReduce work? }, what I understood was that tasktrackers send heartbeat > messages indicating free slots where tasks may be scheduled and the job > scheduler receives these heartbeat messages and based on the received ip > address schedules the task of the next job whose input split is "closest" > in the network topology sense to the current tasktracker from which the > message is received. > > My question is isnt the scheduler needlessly restricting the throughput of > the system i.e. what if there were another job which was not picked by the > scheduler whose tasks are more local to the current tasktracker and by the > time they get picked, the current tasktracker has no free slots. Wouldnt a > shortest job first scheduling algorithm make a lot more sense w.r.t > throughput and latency? > > Best, > Subramanian -- Harsh J