I will add to the discussion that the ability to have multiple tasks of equal priority all making progress simultaneously is important in academic environments. There are a number of undergraduate programs which are starting to use Hadoop in code labs for students.

Multiple students should be able to submit jobs and if one student's poorly-written task is grinding up a lot of cycles on a shared cluster, other students still need to be able to test their code in the meantime; ideally, they would not need to enter a lengthy job queue. ... I'd say that this actually applies to development clusters in general, where individual task performance is less important than the ability of multiple developers to test code concurrently.

- Aaron



Joydeep Sen Sarma wrote:
that can run(per job) at any given time.
not possible afaik - but i will be happy to hear otherwise. priorities are a good substitute though. there's no point needlessly restricting concurrency if there's nothing else to run. if there is something else more important to run - then in most cases, assigning a higher priority to that other thing would make the right thing happen. except with long running tasks (usually reducers) that cannot be preempted. (Hadoop does not seem to use OS process priorities at all. I wonder if process priorities can be used as a substitute for pre-emption.) HOD is another solution that you might want to look into - my understanding is that with HOD u can restrict the number of machines used by a job. ________________________________

From: Xavier Stevens [mailto:[EMAIL PROTECTED]
Sent: Wed 1/9/2008 2:57 PM
To: hadoop-user@lucene.apache.org
Subject: RE: Question on running simultaneous jobs



This doesn't work to solve this issue because it sets the total number
of map/reduce tasks. When setting the total number of map tasks I get an
ArrayOutOfBoundsException within Hadoop; I believe because of the input
dataset size (around 90 million lines).

I think it is important to make a distinction between setting total
number of map/reduce tasks and the number that can run(per job) at any
given time.  I would like only to restrict the later, while allowing
Hadoop to divide the data into chunks as it sees fit.


-----Original Message-----
From: Ted Dunning [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 09, 2008 1:50 PM
To: hadoop-user@lucene.apache.org
Subject: Re: Question on running simultaneous jobs


You may need to upgrade, but 15.1 does just fine with multiple jobs in
the cluster.  Use conf.setNumMapTasks(int) and
conf.setNumReduceTasks(int).


On 1/9/08 11:25 AM, "Xavier Stevens" <[EMAIL PROTECTED]> wrote:

Does Hadoop support running simultaneous jobs?  If so, what parameters

do I need to set in my job configuration?  We basically want to give a

job that takes a really long time, half of the total resources of the
cluster so other jobs don't queue up behind it.

I am using Hadoop 0.14.2 currently.  I tried setting
mapred.tasktracker.tasks.maximum to be half of the maximum specified
in mapred-default.xml.  This shows the change in the web
administration page for the job, but it has no effect on the actual
numbers of tasks running.

Thanks,

Xavier







Reply via email to