Arun,
I think that you are strictly correct, but that the original questioner
simply needed some parallelism for reduces, not necessarily parallelism on a
single node.
I could be very wrong. It is always difficult to determine what a question
means, of course, since the person asking the question generally doesn't
understand something about the system (hence the question).
On 10/4/07 1:13 AM, "Arun C Murthy" <[EMAIL PROTECTED]> wrote:
> Ted Dunning wrote:
>> You don't need to do all that work.
>>
>> Just set:
>>
>> <property>
>> <name>mapred.reduce.tasks</name>
>> <value>4</value>
>> <description>The default number of map tasks per job. Typically set
>> to a prime several times greater than number of available hosts.
>> Ignored when mapred.job.tracker is "local".
>> </description>
>> </property>
>>
>> Either in hadoop-site or in your program using
>> conf.set("mapred.reduce.tasks", 4)
>>
>> That will give you 4 reduce threads. You can have lots more than that if
>> you like.
>>
>
> Err... no.
>
> *mapred.reduce.tasks* is the default no. of reduces for a job.
>
> I think the config knob you want is *mapred.tasktrackers.tasks.maximum*.
> (http://lucene.apache.org/hadoop/hadoop-default.html#mapred.tasktracker.tasks.
> maximum)
>
> That, btw, is the maximum no. of tasks of a given kind (map or reduce)
> which can be simultaneously running on a given tasktracker (separate
> jvms). This is a cluster-wide limit, and there are jira issues open to
> make that a per-tracker knob (HADOOP-1245 & HADOOP-1274).
>
> Arun
>
>>
>> On 10/3/07 6:50 PM, "Nguyen Manh Tien" <[EMAIL PROTECTED]> wrote:
>>
>>
>>> I know in Hadoop we can implement multi-threaded, asynchronous mapping with
>>> class MapRunnable. But this don't exist the similar class to do
>>> multi-threaded in reduce phrase. Could we do milti-thread in reduce phrase?.
>>> Does the following code work?
>>>
>>> public void reduce(WritableComparable key, Iterator values,
>>> OutputCollector output, Reporter reporter) {
>>> new SomeThread(output).start(); // transfer OutputCollector to thread
>>> }
>>>
>>> public class SomeThread extend Thread {
>>> OutputCollector ouput;
>>> public SomeThread(OutputCollector output) {
>>> this.output = output;
>>> }
>>> public void run() {
>>> output.collect(key, value);
>>> }
>>> }
>>
>>
>