hello, I would like to be able to control the number of map tasks being run in parallel for a certain job. I found a property called mapred.map.tasks which seems to give some control over that. However, the wiki http://wiki.apache.org/lucene-hadoop/HowManyMapsAndReduces states that if this value is smaller than the size of the file divided by the DFS block size, it does not really have effect - because it is only a hint, and hadoop will probably choose to have as many map tasks as the number of DFS blocks in the file. Can I set this to a lower number? What if I want to have just one map task that goes through each line the input file, without spawning many map tasks? Is that possible? I am also not sure that mapred.map.tasks can be used within a specific job conf, or only in the general configuration such as hadoop-site.xml or mapred-default.xml of hadoop which I don't have control over.
Thanks. Jerr. -- View this message in context: http://www.nabble.com/number-of-map-tasks-tf4939413.html#a14139217 Sent from the Hadoop Users mailing list archive at Nabble.com.
