I got the impression that the original question was about multithreadedmaprunner.java and how something like that could be implemented in the reduce phase as well.
Nguyen - ur code looks alright - but there's no limit on the number of threads u would end up spawning (something that the maprunner avoids) -----Original Message----- From: Ted Dunning [mailto:[EMAIL PROTECTED] Sent: Thursday, October 04, 2007 10:27 AM To: [email protected] Subject: Re: Multi-threaded Reduce Arun, I think that you are strictly correct, but that the original questioner simply needed some parallelism for reduces, not necessarily parallelism on a single node. I could be very wrong. It is always difficult to determine what a question means, of course, since the person asking the question generally doesn't understand something about the system (hence the question). On 10/4/07 1:13 AM, "Arun C Murthy" <[EMAIL PROTECTED]> wrote: > Ted Dunning wrote: >> You don't need to do all that work. >> >> Just set: >> >> <property> >> <name>mapred.reduce.tasks</name> >> <value>4</value> >> <description>The default number of map tasks per job. Typically set >> to a prime several times greater than number of available hosts. >> Ignored when mapred.job.tracker is "local". >> </description> >> </property> >> >> Either in hadoop-site or in your program using >> conf.set("mapred.reduce.tasks", 4) >> >> That will give you 4 reduce threads. You can have lots more than that if >> you like. >> > > Err... no. > > *mapred.reduce.tasks* is the default no. of reduces for a job. > > I think the config knob you want is *mapred.tasktrackers.tasks.maximum*. > (http://lucene.apache.org/hadoop/hadoop-default.html#mapred.tasktracker. tasks. > maximum) > > That, btw, is the maximum no. of tasks of a given kind (map or reduce) > which can be simultaneously running on a given tasktracker (separate > jvms). This is a cluster-wide limit, and there are jira issues open to > make that a per-tracker knob (HADOOP-1245 & HADOOP-1274). > > Arun > >> >> On 10/3/07 6:50 PM, "Nguyen Manh Tien" <[EMAIL PROTECTED]> wrote: >> >> >>> I know in Hadoop we can implement multi-threaded, asynchronous mapping with >>> class MapRunnable. But this don't exist the similar class to do >>> multi-threaded in reduce phrase. Could we do milti-thread in reduce phrase?. >>> Does the following code work? >>> >>> public void reduce(WritableComparable key, Iterator values, >>> OutputCollector output, Reporter reporter) { >>> new SomeThread(output).start(); // transfer OutputCollector to thread >>> } >>> >>> public class SomeThread extend Thread { >>> OutputCollector ouput; >>> public SomeThread(OutputCollector output) { >>> this.output = output; >>> } >>> public void run() { >>> output.collect(key, value); >>> } >>> } >> >> >
