Is there any rule-of-thumb for setting the maximum number of mappers and reducers per task tracker, via the mapred.tasktracker.xxx.tasks.maximum properties? I have data nodes with 24-cores (4 CPUs w/ 6 cores) and 24 GB RAM. I have the child processes using -Xmx1024m, so 1 GB each.
I currently have the maximums set to 16. This potentially will result in 32 processes (16 mappers and 16 reducers), so more processes than cores and more potential memory use than physical memory. However, it also potentially leaves resources unused if I am running a map-only job, in which only 16 mapper processes will be used, so 8 cores and 8 GB aren't doing much. What have others been setting these values to, and for what hardware?
