So, I conclude that a partition is defined by the offset. But, for example, a Map Tasks produces 5 partitions. How the reduce knows that it must fetch the 5 partitions? Where's this information? This information is not only given by the offset.
On Wed, Dec 22, 2010 at 9:07 PM, Ravi Gummadi <gr...@yahoo-inc.com> wrote: > Each map task will generate a single intermediate file (i.e. Map output > file). This is obtained by merging multiple spills, if spills needed to > happen. > > Index file gives the details of the offset and length for each reducer. > Offset is offset in the map output file where the input data for the > particular reducer starts and length is the size of the data starting from > the offset. > > -Ravi > > > On 12/23/10 2:17 AM, "Pedro Costa" <psdc1...@gmail.com> wrote: > > Hi, > > 1 - I would like to understand how a partition works in the Map > Reduce. I know that the Map Reduce contains the IndexRecord class that > indicates the length of something. Is it the length of a partition or > of a spill? > > 2 - In large map output, a partition can be a set of spills, or a > spill is simple the same thing as a partition? > > Thanks, > -- > Pedro > > -- Pedro