Hi all, No matter what I try, the number of mapper tasks on a given machine is always 2. JobConf.setNumMapTasks(X) has no effect, nor does setting mapred.map.tasks in the mapred-default.xml configuration. Why are these settings ignored? How can I truly increase the number of map tasks on a given machine?
I ran a job last night (using 0.14.1) that took 31.5 minutes to map 7.5 GB (on HDFS, not s3fs) and then 78 seconds to reduce the results of that map (starting from 15% complete when the map phase hit 100%). The map took so long because only 6 - 8 out of the 171 mappers were running at any one time. I'd really like to know how to move the needle on this one so if anyone has any insight, I'd really appreciate it. Thanks. -- Toby DiPasquale
