Just to elaborate a little bit more to what Chris said, the intermediate
map-outputs are sorted and written to disk. 
In reduce there are 3 phases , copy, sort-merge and reduce (when user's
reduce funcion is called). As mappers complete and write
their sorted output to disk, the reduce tasks can copy their chunks over
HTTP and do a sort-merge. These 2 phases can complete 
for a reducer when all the mappers have finished, after which the user's
reduce function is invoked.

-Ankur

-----Original Message-----
From: Chris Douglas [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, July 15, 2008 4:34 AM
To: [email protected]
Subject: Re: When does reducer read mapper's intermediate result?

Not quite; the intermediate output is written to the local disk on the
node executing MapTask and fetched over HTTP by the ReduceTask. The
ReduceTask need only wait for the MapTask to complete successfully
before fetching its output, but it cannot start before all MapTasks have
finished. The intermediate output is sorted, so the ReduceTask only
needs to merge the output produced by the map and group by key (using
the grouping comparator). -C

On Jul 14, 2008, at 3:59 PM, Mori Bellamy wrote:

> i'm pretty sure that the reducer waits for all of the map tasks'  
> output to be written to HDFS (or else i nee no use for the Combiner 
> class).  i'm not sure about your second question though. my gut tells 
> me "no"
>
>
> On Jul 14, 2008, at 3:50 PM, Kevin wrote:
>
>> Hi, there,
>>
>> I am interested in the implementation details of hadoop mapred. In
>> particular, does the reducer wait till a map task ends and then fetch
>> the output (key-value pairs)? If so, is the very file produced by a
>> mapper for the reducer sorted before reducer gets it? (which means
>> that the reducer only needs to do merge sort when it gets all the
>> intermediate files from different mappers).
>>
>> Best,
>> -Kevin
>

Reply via email to