Hi Brad IMHO, here is how hadoop scheduler works
For Hadoop 1.x In one heartbeat interval, Fifo Scheduler keep strict FIFO order and assign a node as many as possible local task and only one remote task. For Fair scheduler, if you turn on delay algorithm, it also assign local task as many as possible but none remote task if a job does not wait longer than a threshold. If a job wait longer than a threshold, fair scheduler will assign remote task to this job. For Hadoop 2.x In one heartbeat interval, Fifoscheduler does not keep the strict FIFO order, it assign local containers to a AM on a given node, if this AM blacklist a node, Fifoscheduler will skip this AM and check the next AM in the job queue. For data locality, FifoScheduler is sharing the same policy with Capacityscheduler since they are both using FiCaSchedulerApp.allocate() to assign container. Hope it is useful Regards! Chen On Tue, Jan 26, 2016 at 2:04 PM, Sultan <sul1...@gmail.com> wrote: > Brad Childs <bdc@...> writes: > > > > > Sorry if this is the wrong list, i am looking for deep technical/hadoop > source help :) > > > > How does job scheduling work on yarn framework for map reduce jobs? I > see > the yarn scheduler discussed here: > > > > https://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/YARN.html > which leads me > > to believe tasks are scheduled based on node capacity and not data > locality. I've sifted through the fair > > scheduler and can't find anything about data location or locality. > > > > Where does data locality play into the scheduling of map/reduce tasks on > yarn? Can someone point me to the > > hadoop 2.x source where the data block location is used to calculate > node/container/task assignment (if > > thats still happening). > > > > -bc > > > > > > > > Hi Brad, > > Were you able to find an answer for you question? > > Sultan > > >