Hey, On Thu, Feb 24, 2011 at 6:26 PM, Dongwon Kim <[email protected]> wrote: > I've been trying to read "MapTask.java" after reading some references such > as "Hadoop definitive guide" and > "http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html", but > it's quite tough to directly read the code without detailed comments.
Perhaps you can add some after getting things cleared ;-) > Q2) > > Is it efficient to partition data first and then sort records inside each > partition? > > Does it happen to avoid comparing expensive pair-wise key comparisons? Typically you would only want sorting done inside a partitioned set, since all of the different partitions are sent off to different reducers. Total-order partitioning may be an exception here, perhaps. > Q3) > > Are there any documents containing explanations about how such internal > classes are implemented? There's a very good presentation you may want to see, on the spill/shuffle/sort framework portions your doubts are about: http://www.slideshare.net/hadoopusergroup/ordered-record-collection HTH :) -- Harsh J www.harshj.com
