RE: Question on running simultaneous jobs

Joydeep Sen Sarma Wed, 09 Jan 2008 15:22:40 -0800

> that can run(per job) at any given time.  
 
not possible afaik - but i will be happy to hear otherwise.
 
priorities are a good substitute though. there's no point needlessly 
restricting concurrency if there's nothing else to run. if there is something 
else more important to run - then in most cases, assigning a higher priority to 
that other thing would make the right thing happen.
 
except with long running tasks (usually reducers) that cannot be preempted. 
(Hadoop does not seem to use OS process priorities at all. I wonder if process 
priorities can be used as a substitute for pre-emption.)
 
HOD is another solution that you might want to look into - my understanding is 
that with HOD u can restrict the number of machines used by a job.
 
________________________________

From: Xavier Stevens [mailto:[EMAIL PROTECTED]
Sent: Wed 1/9/2008 2:57 PM
To: hadoop-user@lucene.apache.org
Subject: RE: Question on running simultaneous jobs

This doesn't work to solve this issue because it sets the total number
of map/reduce tasks. When setting the total number of map tasks I get an
ArrayOutOfBoundsException within Hadoop; I believe because of the input
dataset size (around 90 million lines).

I think it is important to make a distinction between setting total
number of map/reduce tasks and the number that can run(per job) at any
given time.  I would like only to restrict the later, while allowing
Hadoop to divide the data into chunks as it sees fit.

-----Original Message-----
From: Ted Dunning [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 09, 2008 1:50 PM
To: hadoop-user@lucene.apache.org
Subject: Re: Question on running simultaneous jobs

You may need to upgrade, but 15.1 does just fine with multiple jobs in
the cluster.  Use conf.setNumMapTasks(int) and
conf.setNumReduceTasks(int).

On 1/9/08 11:25 AM, "Xavier Stevens" <[EMAIL PROTECTED]> wrote:

> Does Hadoop support running simultaneous jobs?  If so, what parameters

> do I need to set in my job configuration?  We basically want to give a

> job that takes a really long time, half of the total resources of the
> cluster so other jobs don't queue up behind it.
>
> I am using Hadoop 0.14.2 currently.  I tried setting
> mapred.tasktracker.tasks.maximum to be half of the maximum specified
> in mapred-default.xml.  This shows the change in the web
> administration page for the job, but it has no effect on the actual
> numbers of tasks running.
>
> Thanks,
>
> Xavier
>

RE: Question on running simultaneous jobs

Reply via email to