Re: Job scheduling (Re: Unable to run more than one job concurrently)

Eric Baldeschwieler Fri, 19 May 2006 11:30:37 -0700

We're planning to experiment with using existing batch schedulingsystems for addressing these concerns later in the year. Condor andTorque being the leading contenders.

The thinking is that these systems have huge investments inconfigurable scheduling policies and that it is best to keep hadoopsimple and leverage these systems to get fine grained / multi-userscheduling control.

If this works, the idea is to run a durable HDFS cluster and have thebatch systems setup task tracker networks for each user on demand.This approach is probably more applicable if you have large clusterswith distinct users / systems sharing them, so this may not addressyour requirements.

In any case this is why my team is not putting a lot of thought intothis problem in the short term. That said, I've always anticipatedthat others in the hadoop community might pursue improvedscheduling. I just advocate keeping it simple, because when you lookat condor or torque you will quickly appreciate how unsimple it canbecome!



On May 19, 2006, at 11:01 AM, Paul Sutter wrote:

A few suggestions to allow for a very simple extension to the current
scheduling:

(1) Allow submission times in the future, enabling the creation of
"background" jobs. My understanding is that job submission timesare used to
prioritize scheduling. All tasks from a job submitted early run to
completion before those of a job submitted later. If we couldsubmit anydays-long jobs with a submission time in the future, say the year2010, andany short hours-long jobs with the current time, that short jobwould be
able to interrupt the long job. Hack? Yes. Useful? I think so.

(2) Have a per-job total task count limit. Currently, we establish the
number of tasks each node runs, and how many map or reduce tasks wehavetotal in a given job. But it would be great if we could set aceiling on thenumber of tasks that run concurrently for a given job. This mayhelp withAndrzej's fetcher (since it is bandwidth constrained, maybe fewerconcurrent
jobs would be fine?).

(3) Don't start the reducers until a certain number of mappers have
completed (25%? 75%? 90%?). This optimization of starting earlywill be less
important when we've solved the map output copy problems.

Just a few ideas.

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On BehalfOf Bryan
A. Pendleton
Sent: Friday, May 19, 2006 10:44 AM
To: [email protected]
Subject: Re: Job scheduling (Re: Unable to run more than one job
concurrently)
There are some additional risks to running simultaneous jobs. Rightnow,Hadoop does a very bad job dealing with out-of-space conditions. Ifyou run
two jobs, where the total amount of temporary space (for map outputs)
between both jobs is greater than the amount of space available on the
cluster, then they will both fail. If you run them serially, theyshould
both succeed.
In the very least, it's probably wise to take into account morethan justscheduling priority in any scheduler. (Expected) temporary spacedemands,bandwidth limits, and size of jobs should be some of the criteriaavailable
to the scheduler.

On 5/19/06, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
Andrzej Bialecki wrote:
Hi all,

I'm running Hadoop on a relatively small cluster (5 nodes) with
growing datasets.

I noticed that if I start a job that is configured to run more map
tasks than is the cluster capacity(mapred.tasktracker.tasks.maximum *number of nodes, 20 in this case), of course only that many maptaskswill run, and when they are finished the next map tasks from thatjob
will be scheduled.
However, when I try to start another job in parallel, only itsreduce
tasks will be scheduled (uselessly spin-waiting for map output, and
only reducing the number of available tasks in the cluster...),and no
map tasks from this job will be scheduled - until the first job
completes. This feels wrong - not only I'm not making progress onthe
second job, but I'm also taking the slots away from the first job!

I'm somewhat miffed about this - I'd think that jobtracker should
split the available resources evenly between these two jobs, i.e. it
should schedule some map tasks from the first job and some from the
second one. This is not what is happening, though ...

Is this a configuration error, a bug, or a feature? :)
It seems it's a feature - I found the code in
JobTracker.pollForNewTask(), and I'm not too happy about it.

Let's consider the following example: if I'm running a Nutch fetcher,
the main limitation is the available bandwidth to fetch pages, andnotthe capacity of the cluster. I'd love to be able to execute otherjobs
in parallel, so that I don't have to wait until fetcher completes. I
could sacrifice some of the task slots on tasktrackers for that other
job, because the fetcher job wouldn't suffer from this anyway (atleast
not too much).
So, I'd like to change this code to pick up a random job from thelistjobsByArrival, and take job.obtainNewMapTask from that randomlyselectedjob. Would that work? Additionally, if no map tasks from that jobhavebeen allocated I'd like to skip adding reduce tasks from that job,later
in lines 721-750.

Perhaps we should extend JobInProgress to include a priority, and
implement something a la Unix scheduler.

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com
--
Bryan A. Pendleton
Ph: (877) geek-1-bp

Re: Job scheduling (Re: Unable to run more than one job concurrently)

Reply via email to