Is there any rule-of-thumb for setting the maximum number of mappers and 
reducers per task tracker, via the mapred.tasktracker.xxx.tasks.maximum 
properties? I have data nodes with 24-cores (4 CPUs w/ 6 cores) and 24 GB RAM. 
I have the child processes using -Xmx1024m, so 1 GB each.

I currently have the maximums set to 16. This potentially will result in 32 
processes (16 mappers and 16 reducers), so more processes than cores and more 
potential memory use than physical memory. However, it also potentially leaves 
resources unused if I am running a map-only job, in which only 16 mapper 
processes will be used, so 8 cores and 8 GB aren't doing much.

What have others been setting these values to, and for what hardware?

Reply via email to