Thats a good point. I was indeed using gzip file that has a csv file in it. I uncompressed it and used csv file and now I can see many mappers running concurrently.
Thanks for the suggestion. This is an important piece of information many people will miss since compressed format is a more logical way of passing the data. Not sure if this is documented on Hadoop but I could not find it. Praveen ________________________________ From: ext Steve Lewis [mailto:lordjoe2...@gmail.com] Sent: Tuesday, November 16, 2010 12:33 PM To: mapreduce-user@hadoop.apache.org Subject: Re: Mapper runs only on one machine Are you sure your input file is splittable - many files (say gzip) are not and such files must be processed on a single machine On Tue, Nov 16, 2010 at 9:24 AM, <praveen.pe...@nokia.com<mailto:praveen.pe...@nokia.com>> wrote: Hi all, I have been trying to figure out why all mappers run only on one machine when I have 4 node cluster. Ruduce part is running fine on all 4 nodes correctly. I am using 0.20.2. My input file is a large single file (10GB) Here is my config in mapred-site.xml. I specified map.tasks as 30 but I only se one map task and that too only on one machine. Are there any other parameters I need to set in order to control uniform distribution of map job? <configuration> <property> <name>mapred.job.tracker</name> <value>master-hadoop:54311</value> <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> </property> <property> <name>mapred.child.java.opts</name> <value>-Xmx4096m</value> <description>map heap size for child task</description> </property> <property> <name>mapred.reduce.parallel.copies</name> <value>5</value> <description></description> </property> <property> <name>mapred.map.tasks</name> <value>30</value> <description></description> </property> <property> <name>mapred.reduce.tasks</name> <value>6</value> <description></description> </property> </configuration> -- Steven M. Lewis PhD 4221 105th Ave Ne Kirkland, WA 98033 206-384-1340 (cell) Institute for Systems Biology Seattle WA