PartLength is compressed length as the map output data could be compressed based on config setting. RawLength is uncompressed length.
SortAndSpill() in MapTask.java has details of these as: rec.startOffset = segmentStart; rec.rawLength = writer.getRawLength(); rec.partLength = writer.getCompressedLength(); -Ravi On 12/23/10 3:52 AM, "Pedro Costa" <psdc1...@gmail.com> wrote: A index record contains 3 variables: startOffset, rawLength and partLength. What's the difference between a raw length and a partition length? On Wed, Dec 22, 2010 at 10:05 PM, Ravi Gummadi <gr...@yahoo-inc.com> wrote: > Each map task produces R partitions(as part of its output file) if the > number of reduce tasks for the job is R. > Reduce task asks the TaskTrackerWhereMapRan for its input. TaskTracker gives > the corresponding partition in the map output file based on the reduce task > id. For eg. TaskTracker gives the k th partition for reduce task > xxx_r_00000k. > > -Ravi > > On 12/23/10 3:24 AM, "Pedro Costa" <psdc1...@gmail.com> wrote: > > So, I conclude that a partition is defined by the offset. > But, for example, a Map Tasks produces 5 partitions. How the reduce > knows that it must fetch the 5 partitions? Where's this information? > This information is not only given by the offset. > > On Wed, Dec 22, 2010 at 9:07 PM, Ravi Gummadi <gr...@yahoo-inc.com> wrote: >> Each map task will generate a single intermediate file (i.e. Map output >> file). This is obtained by merging multiple spills, if spills needed to >> happen. >> >> Index file gives the details of the offset and length for each reducer. >> Offset is offset in the map output file where the input data for the >> particular reducer starts and length is the size of the data starting from >> the offset. >> >> -Ravi >> >> >> On 12/23/10 2:17 AM, "Pedro Costa" <psdc1...@gmail.com> wrote: >> >> Hi, >> >> 1 - I would like to understand how a partition works in the Map >> Reduce. I know that the Map Reduce contains the IndexRecord class that >> indicates the length of something. Is it the length of a partition or >> of a spill? >> >> 2 - In large map output, a partition can be a set of spills, or a >> spill is simple the same thing as a partition? >> >> Thanks, >> -- >> Pedro >> >> > > > > -- > Pedro > > -- Pedro