Re: Only running hadoop Map tasks

Amareshwari Sri Ramadasu Wed, 06 Jan 2010 03:39:34 -0800

Setting mapred.reduce.tasks to zero is enough. You need not modify other 
configurations.
Also, you need not modify the values in mapred-default.xml, you can have a 
mapred-site.xml with :
<property>
 <name>mapred.reduce.tasks</name>
  <value>0</value>
  <description>The default number of reduce tasks per job. Typically set to 99%
  of the cluster's reduce capacity, so that if a node fails the reduces can
  still be executed in a single wave.
  Ignored when mapred.job.tracker is "local".
 </description>
Or the you can run your job with -Dmapred.reduce.tasks=0 option.

Thanks
Amareshwari

On 1/6/10 4:53 PM, "psdc1978" <[email protected]> wrote:

See my question inline.

On Tue, Jan 5, 2010 at 6:32 PM, Owen O'Malley <[email protected]> wrote:
>
> On Jan 5, 2010, at 9:13 AM, psdc1978 wrote:
>
>> 1 - I would like to see what is output that the Maps is doing on my
>> example. Is it possible to put hadoop only running Map tasks,
>> excluding the Reduce tasks?
>
> Set the number of reduce tasks to 0.

I've updated the file "/opt/hadoop/src/mapred/mapred-default.xml" with
the following value:

<property>
  <name>mapred.reduce.tasks</name>
  <value>0</value>
  <description>The default number of reduce tasks per job. Typically set to 99%
  of the cluster's reduce capacity, so that if a node fails the reduces can
  still be executed in a single wave.
  Ignored when mapred.job.tracker is "local".
  </description>
</property>

<property>
  <name>mapred.reduce.parallel.copies</name>
  <value>0</value>
  <description>The default number of parallel transfers run by reduce
  during the copy(shuffle) phase.
  </description>
</property>

<property>
  <name>mapred.tasktracker.reduce.tasks.maximum</name>
  <value>0</value>
  <description>The maximum number of reduce tasks that will be run
  simultaneously by a task tracker.
  </description>
</property>

 <property>
    <name>mapred.task.profile.reduces</name>
    <value>0-0</value>
    <description> To set the ranges of reduce tasks to profile.
    mapred.task.profile has to be set to true for the value to be accounted.
    </description>
  </property>

Are these values enough to not run the reduce tasks? I don't think so,
because I've also searched "/tmp/hadoop-pcosta/" directory that to
find the output of the map, but I can't find them. Are this output
written in binary?

>
>> 2 - The output of the Maps is written into a temporary file?
>
> Each map's unsorted output will be sent to the OutputFormat, which writes it
> to the output directory.
>
>> 3 - How the output of the maps is passed to the reduce tasks? Is using
>> a socket or reading a file in the disk?
>
> MapReduce does not assume any shared disks between machines. The map outputs
> are transfered via http.
>
> -- Owen
>
>

--
Pedro

Re: Only running hadoop Map tasks

Reply via email to