What you are asking for (and much more sophisticated "slicing/dicing" of the cluster) is possible with MapR's distro. Please contact me offline if you are interested, or try it for yourself at www.mapr.com/download
On Mon, Sep 10, 2012 at 2:06 AM, Safdar Kureishy <safdar.kurei...@gmail.com>wrote: > Hi, > > I need to run some benchmarking tests for a given mapreduce job on a > *subset > *of a 10-node Hadoop cluster. Not that it matters, but the current cluster > settings allow for ~20 map slots and 10 reduce slots per node. > > Without loss of generalization, let's say I want a job with these > constraints below: > - to use only *5* out of the 10 nodes for running the mappers, > - to use only *5* out of the 10 nodes for running the reducers. > > Is there any other way of achieving this through Hadoop property overrides > during job-submission time? I understand that the Fair Scheduler can > potentially be used to create pools of a proportionate # of mappers and > reducers, to achieve a similar outcome, but the problem is that I still > cannot tie such a pool to a fixed # of machines (right?). Essentially, > regardless of the # of map/reduce tasks involved, I only want a *fixed # of > machines* to handle the job. > > Any tips on how I can go about achieving this? > > Thanks, > Safdar >