Re: Spill and Map Output

Ravi Gummadi Wed, 22 Dec 2010 14:06:14 -0800

Each map task produces R partitions(as part of its output file) if the number 
of reduce tasks for the job is R.
Reduce task asks the TaskTrackerWhereMapRan for its input. TaskTracker gives 
the corresponding partition in the map output file based on the reduce task id. 
For eg. TaskTracker gives the k th partition for reduce task xxx_r_00000k.


-Ravi

On 12/23/10 3:24 AM, "Pedro Costa" <psdc1...@gmail.com> wrote:

So, I conclude that a partition is defined by the offset.
But, for example, a Map Tasks produces 5 partitions. How the reduce
knows that it must fetch the 5 partitions? Where's this information?
This information is not only given by the offset.

On Wed, Dec 22, 2010 at 9:07 PM, Ravi  Gummadi <gr...@yahoo-inc.com> wrote:
> Each map task will generate a single intermediate file (i.e. Map output
> file). This is obtained by merging multiple spills, if spills needed to
> happen.
>
> Index file gives the details of the offset and length for each reducer.
> Offset is offset in the map output file where the input data for the
> particular reducer starts and length is the size of the data starting from
> the offset.
>
> -Ravi
>
>
> On 12/23/10 2:17 AM, "Pedro Costa" <psdc1...@gmail.com> wrote:
>
> Hi,
>
> 1 - I would like to understand how a partition works in the Map
> Reduce. I know that the Map Reduce contains the IndexRecord class that
> indicates the length of something. Is it the length of a partition or
> of a spill?
>
> 2 - In large map output, a partition can be a set of spills, or a
> spill is simple the same thing as a partition?
>
> Thanks,
> --
> Pedro
>
>



--
Pedro

Re: Spill and Map Output

Reply via email to