Hi, > > The optimization of one Hadoop job I'm running would benefit from knowing > the > maximum number of map slots in the Hadoop cluster. > > This number can be obtained (if my understanding is correct) by: > > * parsing the mapred-site.xml file to get > the mapred.tasktracker.map.tasks.maximum value (assuming it is set of > course) > > * parsing the slaves file to get the maximum number of compute nodes in the > cluster > > * multiplying the 2 values > > My question is: > I would like to learn about *all* possible ways to get this information > through API calls (either the Hadoop Common API or the Hadoop MapReduce > API), i.e. obtaining it through a Job object, through a Configuration > object,... >
The easiest way I can think of is using o.a.h.m.ClusterStatus.getMaxMapTasks(). You can get an instance to ClusterStatus using JobClient.getClusterStatus(). Thanks hemanth