PartLength is compressed length as the map output data could be compressed 
based on config setting.
RawLength is uncompressed length.

SortAndSpill() in MapTask.java has details of these as:

           rec.startOffset = segmentStart;
            rec.rawLength = writer.getRawLength();
            rec.partLength = writer.getCompressedLength();

-Ravi

On 12/23/10 3:52 AM, "Pedro Costa" <psdc1...@gmail.com> wrote:

A index record contains 3 variables:
startOffset, rawLength and partLength.

What's the difference between a raw length and a partition length?


On Wed, Dec 22, 2010 at 10:05 PM, Ravi  Gummadi <gr...@yahoo-inc.com> wrote:
> Each map task produces R partitions(as part of its output file) if the
> number of reduce tasks for the job is R.
> Reduce task asks the TaskTrackerWhereMapRan for its input. TaskTracker gives
> the corresponding partition in the map output file based on the reduce task
> id. For eg. TaskTracker gives the k th partition for reduce task
> xxx_r_00000k.
>
> -Ravi
>
> On 12/23/10 3:24 AM, "Pedro Costa" <psdc1...@gmail.com> wrote:
>
> So, I conclude that a partition is defined by the offset.
> But, for example, a Map Tasks produces 5 partitions. How the reduce
> knows that it must fetch the 5 partitions? Where's this information?
> This information is not only given by the offset.
>
> On Wed, Dec 22, 2010 at 9:07 PM, Ravi  Gummadi <gr...@yahoo-inc.com> wrote:
>> Each map task will generate a single intermediate file (i.e. Map output
>> file). This is obtained by merging multiple spills, if spills needed to
>> happen.
>>
>> Index file gives the details of the offset and length for each reducer.
>> Offset is offset in the map output file where the input data for the
>> particular reducer starts and length is the size of the data starting from
>> the offset.
>>
>> -Ravi
>>
>>
>> On 12/23/10 2:17 AM, "Pedro Costa" <psdc1...@gmail.com> wrote:
>>
>> Hi,
>>
>> 1 - I would like to understand how a partition works in the Map
>> Reduce. I know that the Map Reduce contains the IndexRecord class that
>> indicates the length of something. Is it the length of a partition or
>> of a spill?
>>
>> 2 - In large map output, a partition can be a set of spills, or a
>> spill is simple the same thing as a partition?
>>
>> Thanks,
>> --
>> Pedro
>>
>>
>
>
>
> --
> Pedro
>
>



--
Pedro

Reply via email to