Hi all, I have been trying to figure out why all mappers run only on one machine when I have 4 node cluster. Ruduce part is running fine on all 4 nodes correctly. I am using 0.20.2. My input file is a large single file (10GB)
Here is my config in mapred-site.xml. I specified map.tasks as 30 but I only se one map task and that too only on one machine. Are there any other parameters I need to set in order to control uniform distribution of map job? <configuration> <property> <name>mapred.job.tracker</name> <value>master-hadoop:54311</value> <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> </property> <property> <name>mapred.child.java.opts</name> <value>-Xmx4096m</value> <description>map heap size for child task</description> </property> <property> <name>mapred.reduce.parallel.copies</name> <value>5</value> <description></description> </property> <property> <name>mapred.map.tasks</name> <value>30</value> <description></description> </property> <property> <name>mapred.reduce.tasks</name> <value>6</value> <description></description> </property> </configuration>