Are you sure your input file is splittable - many files (say gzip) are not and such files must be processed on a single machine
On Tue, Nov 16, 2010 at 9:24 AM, <praveen.pe...@nokia.com> wrote: > Hi all, > I have been trying to figure out why all mappers run only on one machine > when I have 4 node cluster. Ruduce part is running fine on all 4 nodes > correctly. I am using 0.20.2. My input file is a large single file (10GB) > > Here is my config in mapred-site.xml. I specified map.tasks as 30 but I > only se one map task and that too only on one machine. Are there any other > parameters I need to set in order to control uniform distribution of map > job? > <configuration> > <property> > <name>mapred.job.tracker</name> > <value>master-hadoop:54311</value> > <description>The host and port that the MapReduce job tracker > runs > at. If "local", then jobs are run in-process as a single map > and reduce task. > </description> > </property> > <property> > <name>mapred.child.java.opts</name> > <value>-Xmx4096m</value> > <description>map heap size for child task</description> > </property> > <property> > <name>mapred.reduce.parallel.copies</name> > <value>5</value> > <description></description> > </property> > <property> > <name>mapred.map.tasks</name> > <value>30</value> > <description></description> > </property> > <property> > <name>mapred.reduce.tasks</name> > <value>6</value> > <description></description> > </property> > </configuration> > > -- Steven M. Lewis PhD 4221 105th Ave Ne Kirkland, WA 98033 206-384-1340 (cell) Institute for Systems Biology Seattle WA