Robert,

On Jul 7, 2012, at 6:37 PM, Grandl Robert wrote:

> Hi,
> 
> I have some questions related to basic functionality in Hadoop. 
> 
> 1. When a Mapper process the intermediate output data, how it knows how many 
> partitions to do(how many reducers will be) and how much data to go in each  
> partition for each reducer ?
> 
> 2. A JobTracker when assigns a task to a reducer, it will also specify the 
> locations of intermediate output data where it should retrieve it right ? But 
> how a reducer will know from each remote location with intermediate output 
> what portion it has to retrieve only ?

To add to Harsh's comment. Essentially the TT *knows* where the output of a 
given map-id/reduce-id pair is present via an output-file/index-file 
combination.

Arun

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/


Reply via email to