Thats a good point. I was indeed using gzip file that has a csv file in it. I 
uncompressed it and used csv file and now I can see many mappers running 
concurrently.

Thanks for the suggestion. This is an important piece of information many 
people will miss since compressed format is a more logical way of passing the 
data. Not sure if this is documented on Hadoop but I could not find it.

Praveen
________________________________
From: ext Steve Lewis [mailto:lordjoe2...@gmail.com]
Sent: Tuesday, November 16, 2010 12:33 PM
To: mapreduce-user@hadoop.apache.org
Subject: Re: Mapper runs only on one machine

Are you sure your input file is splittable - many files (say gzip) are not and 
such files must be processed on a single machine

On Tue, Nov 16, 2010 at 9:24 AM, 
<praveen.pe...@nokia.com<mailto:praveen.pe...@nokia.com>> wrote:
Hi all,
I have been trying to figure out why all mappers run only on one machine when I 
have 4 node cluster. Ruduce part is running fine on all 4 nodes correctly. I am 
using 0.20.2. My input file is a large single file (10GB)

Here is my config in mapred-site.xml. I specified map.tasks as 30 but I only se 
one map task and that too only on one machine. Are there any other parameters I 
need to set in order to control uniform distribution of map job?
<configuration>
        <property>
          <name>mapred.job.tracker</name>
           <value>master-hadoop:54311</value>
          <description>The host and port that the MapReduce job tracker runs
          at.  If "local", then jobs are run in-process as a single map
          and reduce task.
          </description>
        </property>
        <property>
          <name>mapred.child.java.opts</name>
          <value>-Xmx4096m</value>
          <description>map heap size for child task</description>
        </property>
        <property>
          <name>mapred.reduce.parallel.copies</name>
          <value>5</value>
          <description></description>
        </property>
        <property>
          <name>mapred.map.tasks</name>
          <value>30</value>
          <description></description>
        </property>
        <property>
          <name>mapred.reduce.tasks</name>
          <value>6</value>
          <description></description>
        </property>
</configuration>




--
Steven M. Lewis PhD
4221 105th Ave Ne
Kirkland, WA 98033
206-384-1340 (cell)
Institute for Systems Biology
Seattle WA

Reply via email to