We have several 8 processor machines in our cluster, and for most of our mapper tasks we would like to spawn 8 per machine.

We have 1 mapper task that is extremely resource intensive and we can only spawn 1.

We do have multiple arms for our DFS, so we would like to run multiple reduce jobs on each machine.

We have had little luck changing these parameters by setting the numbers via JobConf
jobConf.setNumMapTasks(int n)
jobConf.setNumReduceTasks(int n)

What we have ended up doing is reconfiguring the cluster by changing the hadoop-site.xml between the different runs, which is awkward.

Have we just fumble fingered it, or is there a way, that we are missing to set the concurrency for mappers and reducers, on a per job basis?

Reply via email to