Re: Yarn / mapreduce scheduling

Chen He Tue, 26 Jan 2016 15:00:06 -0800

Hi Brad

IMHO, here is how hadoop scheduler works

For Hadoop 1.x

In one heartbeat interval, Fifo Scheduler keep strict FIFO order and assign
a node as many as possible local task and only one remote task. For Fair
scheduler, if you turn on delay algorithm, it also assign local task as
many as possible but none remote task if a job does not wait longer than a
threshold. If a job wait longer than a threshold, fair scheduler will
assign remote task to this job.

For Hadoop 2.x

In one heartbeat interval, Fifoscheduler does not keep the strict FIFO
order, it assign local containers to a AM on a given node, if this AM
blacklist a node, Fifoscheduler will skip this AM and check the next AM in
the job queue. For data locality, FifoScheduler is sharing the same policy
with Capacityscheduler since they are both using
FiCaSchedulerApp.allocate() to assign container.

Hope it is useful

Regards!

Chen

On Tue, Jan 26, 2016 at 2:04 PM, Sultan <sul1...@gmail.com> wrote:

> Brad Childs <bdc@...> writes:
>
> >
> > Sorry if this is the wrong list, i am looking for deep technical/hadoop
> source help :)
> >
> > How does job scheduling work on yarn framework for map reduce jobs?  I
> see
> the yarn scheduler discussed here:
> >
>
> https://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/YARN.html
>  which leads me
> > to believe tasks are scheduled based on node capacity and not data
> locality.  I've sifted through the fair
> > scheduler and can't find anything about data location or locality.
> >
> > Where does data locality play into the scheduling of map/reduce tasks on
> yarn?  Can someone point me to the
> > hadoop 2.x source where the data block location is used to calculate
> node/container/task assignment (if
> > thats still happening).
> >
> > -bc
> >
> >
>
>
>
> Hi Brad,
>
> Were you able to find an answer for you question?
>
> Sultan
>
>
>

Re: Yarn / mapreduce scheduling

Reply via email to