Thanks Bertrand/Hemanth, for your prompt replies! This helps :) Regards, Safdar
On Mon, Sep 10, 2012 at 2:18 PM, Bertrand Dechoux <decho...@gmail.com>wrote: > If that is only for benchmarking, you could stop the task-trackers on the > machines you don't want to use. > Or you could setup another cluster. > > But yes, there is not standard way to limit the slots taken by a job to a > specified set of machines. > You might be able to do it using a custom Scheduler but that would be out > of your scope, I guess. > > Regards > > Bertrand > > On Mon, Sep 10, 2012 at 12:01 PM, Hemanth Yamijala <yhema...@gmail.com > >wrote: > > > Hi, > > > > I am not sure if there's any way to restrict the tasks to specific > > machines. However, I think there are some ways of restricting to > > number of 'slots' that can be used by the job. > > > > Also, not sure which version of Hadoop you are on. The > > capacityscheduler > > ( > > > http://hadoop.apache.org/common/docs/r2.0.1-alpha/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html > > ) > > has ways by which you can set up a queue with a hard capacity limit. > > The capacity controls the number of slots that that can be used by > > jobs submitted to the queue. So, if you submit a job to the queue, > > irrespective of the number of tasks it has, it should limit it to > > those slots. However, please note that this does not restrict the > > tasks to specific machines. > > > > Thanks > > Hemanth > > > > On Mon, Sep 10, 2012 at 2:36 PM, Safdar Kureishy > > <safdar.kurei...@gmail.com> wrote: > > > Hi, > > > > > > I need to run some benchmarking tests for a given mapreduce job on a > > *subset > > > *of a 10-node Hadoop cluster. Not that it matters, but the current > > cluster > > > settings allow for ~20 map slots and 10 reduce slots per node. > > > > > > Without loss of generalization, let's say I want a job with these > > > constraints below: > > > - to use only *5* out of the 10 nodes for running the mappers, > > > - to use only *5* out of the 10 nodes for running the reducers. > > > > > > Is there any other way of achieving this through Hadoop property > > overrides > > > during job-submission time? I understand that the Fair Scheduler can > > > potentially be used to create pools of a proportionate # of mappers and > > > reducers, to achieve a similar outcome, but the problem is that I still > > > cannot tie such a pool to a fixed # of machines (right?). Essentially, > > > regardless of the # of map/reduce tasks involved, I only want a *fixed > # > > of > > > machines* to handle the job. > > > > > > Any tips on how I can go about achieving this? > > > > > > Thanks, > > > Safdar > > > > > > -- > Bertrand Dechoux >