We have several 8 processor machines in our cluster, and for most of our
mapper tasks we would like to spawn 8 per machine.
We have 1 mapper task that is extremely resource intensive and we can
only spawn 1.
We do have multiple arms for our DFS, so we would like to run multiple
reduce jobs on each machine.
We have had little luck changing these parameters by setting the numbers
via JobConf
jobConf.setNumMapTasks(int n)
jobConf.setNumReduceTasks(int n)
What we have ended up doing is reconfiguring the cluster by changing the
hadoop-site.xml between the different runs, which is awkward.
Have we just fumble fingered it, or is there a way, that we are missing
to set the concurrency for mappers and reducers, on a per job basis?