it's a stopgap and doesn't seem to be working well for y!: https://issues.apache.org/jira/browse/HADOOP-2510
On Jan 9, 2008 5:30 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: > > What is the status of Hadoop on Demand? Is it ready for prime time? > > > On 1/9/08 4:58 PM, "Aaron Kimball" <[EMAIL PROTECTED]> wrote: > > > I will add to the discussion that the ability to have multiple tasks of > > equal priority all making progress simultaneously is important in > > academic environments. There are a number of undergraduate programs > > which are starting to use Hadoop in code labs for students. > > > > Multiple students should be able to submit jobs and if one student's > > poorly-written task is grinding up a lot of cycles on a shared cluster, > > other students still need to be able to test their code in the meantime; > > ideally, they would not need to enter a lengthy job queue. ... I'd say > > that this actually applies to development clusters in general, where > > individual task performance is less important than the ability of > > multiple developers to test code concurrently. > > > > - Aaron > > > > > > > > Joydeep Sen Sarma wrote: > >>> that can run(per job) at any given time. > >> > >> not possible afaik - but i will be happy to hear otherwise. > >> > >> priorities are a good substitute though. there's no point needlessly > >> restricting concurrency if there's nothing else to run. if there is > something > >> else more important to run - then in most cases, assigning a higher > priority > >> to that other thing would make the right thing happen. > >> > >> except with long running tasks (usually reducers) that cannot be > preempted. > >> (Hadoop does not seem to use OS process priorities at all. I wonder if > >> process priorities can be used as a substitute for pre-emption.) > >> > >> HOD is another solution that you might want to look into - my > understanding > >> is that with HOD u can restrict the number of machines used by a job. > >> > >> ________________________________ > >> > >> From: Xavier Stevens [mailto:[EMAIL PROTECTED] > >> Sent: Wed 1/9/2008 2:57 PM > >> To: hadoop-user@lucene.apache.org > >> Subject: RE: Question on running simultaneous jobs > >> > >> > >> > >> This doesn't work to solve this issue because it sets the total number > >> of map/reduce tasks. When setting the total number of map tasks I get > an > >> ArrayOutOfBoundsException within Hadoop; I believe because of the input > >> dataset size (around 90 million lines). > >> > >> I think it is important to make a distinction between setting total > >> number of map/reduce tasks and the number that can run(per job) at any > >> given time. I would like only to restrict the later, while allowing > >> Hadoop to divide the data into chunks as it sees fit. > >> > >> > >> -----Original Message----- > >> From: Ted Dunning [mailto:[EMAIL PROTECTED] > >> Sent: Wednesday, January 09, 2008 1:50 PM > >> To: hadoop-user@lucene.apache.org > >> Subject: Re: Question on running simultaneous jobs > >> > >> > >> You may need to upgrade, but 15.1 does just fine with multiple jobs in > >> the cluster. Use conf.setNumMapTasks(int) and > >> conf.setNumReduceTasks(int). > >> > >> > >> On 1/9/08 11:25 AM, "Xavier Stevens" <[EMAIL PROTECTED]> wrote: > >> > >>> Does Hadoop support running simultaneous jobs? If so, what parameters > >> > >>> do I need to set in my job configuration? We basically want to give a > >> > >>> job that takes a really long time, half of the total resources of the > >>> cluster so other jobs don't queue up behind it. > >>> > >>> I am using Hadoop 0.14.2 currently. I tried setting > >>> mapred.tasktracker.tasks.maximum to be half of the maximum specified > >>> in mapred-default.xml. This shows the change in the web > >>> administration page for the job, but it has no effect on the actual > >>> numbers of tasks running. > >>> > >>> Thanks, > >>> > >>> Xavier > >>> > >> > >> > >> > >> > >> > >> > >