> that can run(per job) at any given time. not possible afaik - but i will be happy to hear otherwise. priorities are a good substitute though. there's no point needlessly restricting concurrency if there's nothing else to run. if there is something else more important to run - then in most cases, assigning a higher priority to that other thing would make the right thing happen. except with long running tasks (usually reducers) that cannot be preempted. (Hadoop does not seem to use OS process priorities at all. I wonder if process priorities can be used as a substitute for pre-emption.) HOD is another solution that you might want to look into - my understanding is that with HOD u can restrict the number of machines used by a job. ________________________________
From: Xavier Stevens [mailto:[EMAIL PROTECTED] Sent: Wed 1/9/2008 2:57 PM To: hadoop-user@lucene.apache.org Subject: RE: Question on running simultaneous jobs This doesn't work to solve this issue because it sets the total number of map/reduce tasks. When setting the total number of map tasks I get an ArrayOutOfBoundsException within Hadoop; I believe because of the input dataset size (around 90 million lines). I think it is important to make a distinction between setting total number of map/reduce tasks and the number that can run(per job) at any given time. I would like only to restrict the later, while allowing Hadoop to divide the data into chunks as it sees fit. -----Original Message----- From: Ted Dunning [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 09, 2008 1:50 PM To: hadoop-user@lucene.apache.org Subject: Re: Question on running simultaneous jobs You may need to upgrade, but 15.1 does just fine with multiple jobs in the cluster. Use conf.setNumMapTasks(int) and conf.setNumReduceTasks(int). On 1/9/08 11:25 AM, "Xavier Stevens" <[EMAIL PROTECTED]> wrote: > Does Hadoop support running simultaneous jobs? If so, what parameters > do I need to set in my job configuration? We basically want to give a > job that takes a really long time, half of the total resources of the > cluster so other jobs don't queue up behind it. > > I am using Hadoop 0.14.2 currently. I tried setting > mapred.tasktracker.tasks.maximum to be half of the maximum specified > in mapred-default.xml. This shows the change in the web > administration page for the job, but it has no effect on the actual > numbers of tasks running. > > Thanks, > > Xavier >