Re: Multi-threaded Reduce

Arun C Murthy Thu, 04 Oct 2007 01:36:02 -0700

Ted Dunning wrote:

You don't need to do all that work.


Just set:

<property>
  <name>mapred.reduce.tasks</name>
  <value>4</value>
  <description>The default number of map tasks per job.  Typically set
  to a prime several times greater than number of available hosts.
  Ignored when mapred.job.tracker is "local".
  </description>
</property>

Either in hadoop-site or in your program using
conf.set("mapred.reduce.tasks", 4)

That will give you 4 reduce threads.  You can have lots more than that if
you like.


Err... no.

*mapred.reduce.tasks* is the default no. of reduces for a job.

I think the config knob you want is *mapred.tasktrackers.tasks.maximum*.
(http://lucene.apache.org/hadoop/hadoop-default.html#mapred.tasktracker.tasks.maximum)

That, btw, is the maximum no. of tasks of a given kind (map or reduce)which can be simultaneously running on a given tasktracker (separatejvms). This is a cluster-wide limit, and there are jira issues open tomake that a per-tracker knob (HADOOP-1245 & HADOOP-1274).


Arun


On 10/3/07 6:50 PM, "Nguyen Manh Tien" <[EMAIL PROTECTED]> wrote:

I know in Hadoop we can implement multi-threaded, asynchronous mapping with
class MapRunnable. But this don't exist the  similar class to do
multi-threaded in reduce phrase. Could we do milti-thread in reduce phrase?.
Does the following code work?

public void reduce(WritableComparable key, Iterator values,
                    OutputCollector output, Reporter reporter) {
    new SomeThread(output).start(); // transfer OutputCollector to thread
}

public class SomeThread extend Thread {
  OutputCollector ouput;
  public SomeThread(OutputCollector output) {
    this.output = output;
  }
  public void run() {
    output.collect(key, value);
  }
}

Re: Multi-threaded Reduce

Reply via email to