Re: Spill and Map Output

Pedro Costa Wed, 22 Dec 2010 13:54:40 -0800

So, I conclude that a partition is defined by the offset.
But, for example, a Map Tasks produces 5 partitions. How the reduce
knows that it must fetch the 5 partitions? Where's this information?
This information is not only given by the offset.


On Wed, Dec 22, 2010 at 9:07 PM, Ravi  Gummadi <gr...@yahoo-inc.com> wrote:
> Each map task will generate a single intermediate file (i.e. Map output
> file). This is obtained by merging multiple spills, if spills needed to
> happen.
>
> Index file gives the details of the offset and length for each reducer.
> Offset is offset in the map output file where the input data for the
> particular reducer starts and length is the size of the data starting from
> the offset.
>
> -Ravi
>
>
> On 12/23/10 2:17 AM, "Pedro Costa" <psdc1...@gmail.com> wrote:
>
> Hi,
>
> 1 - I would like to understand how a partition works in the Map
> Reduce. I know that the Map Reduce contains the IndexRecord class that
> indicates the length of something. Is it the length of a partition or
> of a spill?
>
> 2 - In large map output, a partition can be a set of spills, or a
> spill is simple the same thing as a partition?
>
> Thanks,
> --
> Pedro
>
>



-- 
Pedro

Re: Spill and Map Output

Reply via email to