Re: Question on running simultaneous jobs

Jeff Hammerbacher Wed, 09 Jan 2008 18:56:02 -0800

it's a stopgap and doesn't seem to be working well for y!:
https://issues.apache.org/jira/browse/HADOOP-2510


On Jan 9, 2008 5:30 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:

>
> What is the status of Hadoop on Demand?  Is it ready for prime time?
>
>
> On 1/9/08 4:58 PM, "Aaron Kimball" <[EMAIL PROTECTED]> wrote:
>
> > I will add to the discussion that the ability to have multiple tasks of
> > equal priority all making progress simultaneously is important in
> > academic environments. There are a number of undergraduate programs
> > which are starting to use Hadoop in code labs for students.
> >
> > Multiple students should be able to submit jobs and if one student's
> > poorly-written task is grinding up a lot of cycles on a shared cluster,
> > other students still need to be able to test their code in the meantime;
> > ideally, they would not need to enter a lengthy job queue. ... I'd say
> > that this actually applies to development clusters in general, where
> > individual task performance is less important than the ability of
> > multiple developers to test code concurrently.
> >
> > - Aaron
> >
> >
> >
> > Joydeep Sen Sarma wrote:
> >>> that can run(per job) at any given time.
> >>
> >> not possible afaik - but i will be happy to hear otherwise.
> >>
> >> priorities are a good substitute though. there's no point needlessly
> >> restricting concurrency if there's nothing else to run. if there is
> something
> >> else more important to run - then in most cases, assigning a higher
> priority
> >> to that other thing would make the right thing happen.
> >>
> >> except with long running tasks (usually reducers) that cannot be
> preempted.
> >> (Hadoop does not seem to use OS process priorities at all. I wonder if
> >> process priorities can be used as a substitute for pre-emption.)
> >>
> >> HOD is another solution that you might want to look into - my
> understanding
> >> is that with HOD u can restrict the number of machines used by a job.
> >>
> >> ________________________________
> >>
> >> From: Xavier Stevens [mailto:[EMAIL PROTECTED]
> >> Sent: Wed 1/9/2008 2:57 PM
> >> To: hadoop-user@lucene.apache.org
> >> Subject: RE: Question on running simultaneous jobs
> >>
> >>
> >>
> >> This doesn't work to solve this issue because it sets the total number
> >> of map/reduce tasks. When setting the total number of map tasks I get
> an
> >> ArrayOutOfBoundsException within Hadoop; I believe because of the input
> >> dataset size (around 90 million lines).
> >>
> >> I think it is important to make a distinction between setting total
> >> number of map/reduce tasks and the number that can run(per job) at any
> >> given time.  I would like only to restrict the later, while allowing
> >> Hadoop to divide the data into chunks as it sees fit.
> >>
> >>
> >> -----Original Message-----
> >> From: Ted Dunning [mailto:[EMAIL PROTECTED]
> >> Sent: Wednesday, January 09, 2008 1:50 PM
> >> To: hadoop-user@lucene.apache.org
> >> Subject: Re: Question on running simultaneous jobs
> >>
> >>
> >> You may need to upgrade, but 15.1 does just fine with multiple jobs in
> >> the cluster.  Use conf.setNumMapTasks(int) and
> >> conf.setNumReduceTasks(int).
> >>
> >>
> >> On 1/9/08 11:25 AM, "Xavier Stevens" <[EMAIL PROTECTED]> wrote:
> >>
> >>> Does Hadoop support running simultaneous jobs?  If so, what parameters
> >>
> >>> do I need to set in my job configuration?  We basically want to give a
> >>
> >>> job that takes a really long time, half of the total resources of the
> >>> cluster so other jobs don't queue up behind it.
> >>>
> >>> I am using Hadoop 0.14.2 currently.  I tried setting
> >>> mapred.tasktracker.tasks.maximum to be half of the maximum specified
> >>> in mapred-default.xml.  This shows the change in the web
> >>> administration page for the job, but it has no effect on the actual
> >>> numbers of tasks running.
> >>>
> >>> Thanks,
> >>>
> >>> Xavier
> >>>
> >>
> >>
> >>
> >>
> >>
> >>
>
>

Re: Question on running simultaneous jobs

Reply via email to