Each map task produces R partitions(as part of its output file) if the number of reduce tasks for the job is R. Reduce task asks the TaskTrackerWhereMapRan for its input. TaskTracker gives the corresponding partition in the map output file based on the reduce task id. For eg. TaskTracker gives the k th partition for reduce task xxx_r_00000k.
-Ravi On 12/23/10 3:24 AM, "Pedro Costa" <psdc1...@gmail.com> wrote: So, I conclude that a partition is defined by the offset. But, for example, a Map Tasks produces 5 partitions. How the reduce knows that it must fetch the 5 partitions? Where's this information? This information is not only given by the offset. On Wed, Dec 22, 2010 at 9:07 PM, Ravi Gummadi <gr...@yahoo-inc.com> wrote: > Each map task will generate a single intermediate file (i.e. Map output > file). This is obtained by merging multiple spills, if spills needed to > happen. > > Index file gives the details of the offset and length for each reducer. > Offset is offset in the map output file where the input data for the > particular reducer starts and length is the size of the data starting from > the offset. > > -Ravi > > > On 12/23/10 2:17 AM, "Pedro Costa" <psdc1...@gmail.com> wrote: > > Hi, > > 1 - I would like to understand how a partition works in the Map > Reduce. I know that the Map Reduce contains the IndexRecord class that > indicates the length of something. Is it the length of a partition or > of a spill? > > 2 - In large map output, a partition can be a set of spills, or a > spill is simple the same thing as a partition? > > Thanks, > -- > Pedro > > -- Pedro