On Jun 6, 2012, at 03:42 , Harsh J wrote: >> I think mapred.tasktracker.map.tasks.maximum sets the number of map > tasks and not slots. > > This is incorrect. The property does configure slots. Please also see > http://wiki.apache.org/hadoop/HowManyMapsAndReduces and > http://wiki.apache.org/hadoop/FAQ#I_see_a_maximum_of_2_maps.2BAC8-reduces_spawned_concurrently_on_each_TaskTracker.2C_how_do_I_increase_that.3F > for more.
But Harsh, wouldn't you agree that the first reference you provided above is talking about the number of tasks spawned for a given job at job-runtime and not the number of slots hard-configured into the cluster at cluster-spinup time? Incidentally, the second reference above is partially broken. It attempts to offer links to dig into further detail about mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum, but the links are broken. For example, one of the two broken links is: http://hadoop.apache.org/common/docs/current/hadoop-default.html#mapred.tasktracker.map.tasks.maximum It's still broken even if you remove the anchor from the end of the URL, which is to say the hadoop-default.html webpage doesn't even exist. In fact, it is difficult find any official documentation on those properties (Google searches for the terms do not provide links to any proper documentation within apache, but rather just lots of back and forth forum discussions about the properties). One thing I did find was a claim that those properties are deprecated in 2.0.0: http://hadoop.apache.org/common/docs/current/hadoop-project-dist/hadoop-common/DeprecatedProperties.html That page indicates that they were replaced with equivalents in which the first component is now 'mapreduce', not 'mapred'. Even with the new terms however, Google still doesn't link to any formal documentation describing those properties. In fact, I have yet to find a webpage anywhere which officially states the purpose/effect of mapred(uce).tasktracker.map.tasks.maximum. That said, I agree that the consensus of discussion and description seems to imply that these properties have a cluster-level (not job-level) effect on the number of map/reduce slots on the cluster, not the number of tasks spawned for a given job. Such a concept obviously convolutes the intuition that slots correspond to cores as I suggested in an earlier post and I apologize for that. ________________________________________________________________________________ Keith Wiley kwi...@keithwiley.com keithwiley.com music.keithwiley.com "Yet mark his perfect self-contentment, and hence learn his lesson, that to be self-contented is to be vile and ignorant, and that to aspire is better than to be blindly and impotently happy." -- Edwin A. Abbott, Flatland ________________________________________________________________________________