Setting mapred.reduce.tasks to zero is enough. You need not modify other configurations. Also, you need not modify the values in mapred-default.xml, you can have a mapred-site.xml with : <property> <name>mapred.reduce.tasks</name> <value>0</value> <description>The default number of reduce tasks per job. Typically set to 99% of the cluster's reduce capacity, so that if a node fails the reduces can still be executed in a single wave. Ignored when mapred.job.tracker is "local". </description> Or the you can run your job with -Dmapred.reduce.tasks=0 option.
Thanks Amareshwari On 1/6/10 4:53 PM, "psdc1978" <[email protected]> wrote: See my question inline. On Tue, Jan 5, 2010 at 6:32 PM, Owen O'Malley <[email protected]> wrote: > > On Jan 5, 2010, at 9:13 AM, psdc1978 wrote: > >> 1 - I would like to see what is output that the Maps is doing on my >> example. Is it possible to put hadoop only running Map tasks, >> excluding the Reduce tasks? > > Set the number of reduce tasks to 0. I've updated the file "/opt/hadoop/src/mapred/mapred-default.xml" with the following value: <property> <name>mapred.reduce.tasks</name> <value>0</value> <description>The default number of reduce tasks per job. Typically set to 99% of the cluster's reduce capacity, so that if a node fails the reduces can still be executed in a single wave. Ignored when mapred.job.tracker is "local". </description> </property> <property> <name>mapred.reduce.parallel.copies</name> <value>0</value> <description>The default number of parallel transfers run by reduce during the copy(shuffle) phase. </description> </property> <property> <name>mapred.tasktracker.reduce.tasks.maximum</name> <value>0</value> <description>The maximum number of reduce tasks that will be run simultaneously by a task tracker. </description> </property> <property> <name>mapred.task.profile.reduces</name> <value>0-0</value> <description> To set the ranges of reduce tasks to profile. mapred.task.profile has to be set to true for the value to be accounted. </description> </property> Are these values enough to not run the reduce tasks? I don't think so, because I've also searched "/tmp/hadoop-pcosta/" directory that to find the output of the map, but I can't find them. Are this output written in binary? > >> 2 - The output of the Maps is written into a temporary file? > > Each map's unsorted output will be sent to the OutputFormat, which writes it > to the output directory. > >> 3 - How the output of the maps is passed to the reduce tasks? Is using >> a socket or reading a file in the disk? > > MapReduce does not assume any shared disks between machines. The map outputs > are transfered via http. > > -- Owen > > -- Pedro
