RE: Multi-threaded Reduce

Joydeep Sen Sarma Thu, 04 Oct 2007 11:43:57 -0700

I got the impression that the original question was about
multithreadedmaprunner.java and how something like that could be
implemented in the reduce phase as well.


Nguyen - ur code looks alright - but there's no limit on the number of
threads u would end up spawning (something that the maprunner avoids)

-----Original Message-----
From: Ted Dunning [mailto:[EMAIL PROTECTED] 
Sent: Thursday, October 04, 2007 10:27 AM
To: [email protected]
Subject: Re: Multi-threaded Reduce


Arun,

I think that you are strictly correct, but that the original questioner
simply needed some parallelism for reduces, not necessarily parallelism
on a
single node.

I could be very wrong.  It is always difficult to determine what a
question
means, of course, since the person asking the question generally doesn't
understand something about the system (hence the question).


On 10/4/07 1:13 AM, "Arun C Murthy" <[EMAIL PROTECTED]> wrote:

> Ted Dunning wrote:
>> You don't need to do all that work.
>> 
>> Just set:
>> 
>> <property>
>>   <name>mapred.reduce.tasks</name>
>>   <value>4</value>
>>   <description>The default number of map tasks per job.  Typically
set
>>   to a prime several times greater than number of available hosts.
>>   Ignored when mapred.job.tracker is "local".
>>   </description>
>> </property>
>> 
>> Either in hadoop-site or in your program using
>> conf.set("mapred.reduce.tasks", 4)
>> 
>> That will give you 4 reduce threads.  You can have lots more than
that if
>> you like.
>> 
> 
> Err... no.
> 
> *mapred.reduce.tasks* is the default no. of reduces for a job.
> 
> I think the config knob you want is
*mapred.tasktrackers.tasks.maximum*.
>
(http://lucene.apache.org/hadoop/hadoop-default.html#mapred.tasktracker.
tasks.
> maximum)
> 
> That, btw, is the maximum no. of tasks of a given kind (map or reduce)
> which can be simultaneously running on a given tasktracker (separate
> jvms). This is a cluster-wide limit, and there are jira issues open to
> make that a per-tracker knob (HADOOP-1245 & HADOOP-1274).
> 
> Arun
> 
>> 
>> On 10/3/07 6:50 PM, "Nguyen Manh Tien" <[EMAIL PROTECTED]>
wrote:
>> 
>> 
>>> I know in Hadoop we can implement multi-threaded, asynchronous
mapping with
>>> class MapRunnable. But this don't exist the  similar class to do
>>> multi-threaded in reduce phrase. Could we do milti-thread in reduce
phrase?.
>>> Does the following code work?
>>> 
>>> public void reduce(WritableComparable key, Iterator values,
>>>                     OutputCollector output, Reporter reporter) {
>>>     new SomeThread(output).start(); // transfer OutputCollector to
thread
>>> }
>>> 
>>> public class SomeThread extend Thread {
>>>   OutputCollector ouput;
>>>   public SomeThread(OutputCollector output) {
>>>     this.output = output;
>>>   }
>>>   public void run() {
>>>     output.collect(key, value);
>>>   }
>>> }
>> 
>> 
>

RE: Multi-threaded Reduce

Reply via email to