What is the status of Hadoop on Demand? Is it ready for prime time?
On 1/9/08 4:58 PM, "Aaron Kimball" <[EMAIL PROTECTED]> wrote: > I will add to the discussion that the ability to have multiple tasks of > equal priority all making progress simultaneously is important in > academic environments. There are a number of undergraduate programs > which are starting to use Hadoop in code labs for students. > > Multiple students should be able to submit jobs and if one student's > poorly-written task is grinding up a lot of cycles on a shared cluster, > other students still need to be able to test their code in the meantime; > ideally, they would not need to enter a lengthy job queue. ... I'd say > that this actually applies to development clusters in general, where > individual task performance is less important than the ability of > multiple developers to test code concurrently. > > - Aaron > > > > Joydeep Sen Sarma wrote: >>> that can run(per job) at any given time. >> >> not possible afaik - but i will be happy to hear otherwise. >> >> priorities are a good substitute though. there's no point needlessly >> restricting concurrency if there's nothing else to run. if there is something >> else more important to run - then in most cases, assigning a higher priority >> to that other thing would make the right thing happen. >> >> except with long running tasks (usually reducers) that cannot be preempted. >> (Hadoop does not seem to use OS process priorities at all. I wonder if >> process priorities can be used as a substitute for pre-emption.) >> >> HOD is another solution that you might want to look into - my understanding >> is that with HOD u can restrict the number of machines used by a job. >> >> ________________________________ >> >> From: Xavier Stevens [mailto:[EMAIL PROTECTED] >> Sent: Wed 1/9/2008 2:57 PM >> To: hadoop-user@lucene.apache.org >> Subject: RE: Question on running simultaneous jobs >> >> >> >> This doesn't work to solve this issue because it sets the total number >> of map/reduce tasks. When setting the total number of map tasks I get an >> ArrayOutOfBoundsException within Hadoop; I believe because of the input >> dataset size (around 90 million lines). >> >> I think it is important to make a distinction between setting total >> number of map/reduce tasks and the number that can run(per job) at any >> given time. I would like only to restrict the later, while allowing >> Hadoop to divide the data into chunks as it sees fit. >> >> >> -----Original Message----- >> From: Ted Dunning [mailto:[EMAIL PROTECTED] >> Sent: Wednesday, January 09, 2008 1:50 PM >> To: hadoop-user@lucene.apache.org >> Subject: Re: Question on running simultaneous jobs >> >> >> You may need to upgrade, but 15.1 does just fine with multiple jobs in >> the cluster. Use conf.setNumMapTasks(int) and >> conf.setNumReduceTasks(int). >> >> >> On 1/9/08 11:25 AM, "Xavier Stevens" <[EMAIL PROTECTED]> wrote: >> >>> Does Hadoop support running simultaneous jobs? If so, what parameters >> >>> do I need to set in my job configuration? We basically want to give a >> >>> job that takes a really long time, half of the total resources of the >>> cluster so other jobs don't queue up behind it. >>> >>> I am using Hadoop 0.14.2 currently. I tried setting >>> mapred.tasktracker.tasks.maximum to be half of the maximum specified >>> in mapred-default.xml. This shows the change in the web >>> administration page for the job, but it has no effect on the actual >>> numbers of tasks running. >>> >>> Thanks, >>> >>> Xavier >>> >> >> >> >> >> >>