Ted Dunning wrote:
You don't need to do all that work.
Just set:
<property>
<name>mapred.reduce.tasks</name>
<value>4</value>
<description>The default number of map tasks per job. Typically set
to a prime several times greater than number of available hosts.
Ignored when mapred.job.tracker is "local".
</description>
</property>
Either in hadoop-site or in your program using
conf.set("mapred.reduce.tasks", 4)
That will give you 4 reduce threads. You can have lots more than that if
you like.
Err... no.
*mapred.reduce.tasks* is the default no. of reduces for a job.
I think the config knob you want is *mapred.tasktrackers.tasks.maximum*.
(http://lucene.apache.org/hadoop/hadoop-default.html#mapred.tasktracker.tasks.maximum)
That, btw, is the maximum no. of tasks of a given kind (map or reduce)
which can be simultaneously running on a given tasktracker (separate
jvms). This is a cluster-wide limit, and there are jira issues open to
make that a per-tracker knob (HADOOP-1245 & HADOOP-1274).
Arun
On 10/3/07 6:50 PM, "Nguyen Manh Tien" <[EMAIL PROTECTED]> wrote:
I know in Hadoop we can implement multi-threaded, asynchronous mapping with
class MapRunnable. But this don't exist the similar class to do
multi-threaded in reduce phrase. Could we do milti-thread in reduce phrase?.
Does the following code work?
public void reduce(WritableComparable key, Iterator values,
OutputCollector output, Reporter reporter) {
new SomeThread(output).start(); // transfer OutputCollector to thread
}
public class SomeThread extend Thread {
OutputCollector ouput;
public SomeThread(OutputCollector output) {
this.output = output;
}
public void run() {
output.collect(key, value);
}
}