Re: About MapTask.java

Harsh J Thu, 24 Feb 2011 06:11:31 -0800

Hey,

On Thu, Feb 24, 2011 at 6:26 PM, Dongwon Kim <[email protected]> wrote:
> I've been trying to read "MapTask.java" after reading some references such
> as "Hadoop definitive guide" and
> "http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html";, but
> it's quite tough to directly read the code without detailed comments.


Perhaps you can add some after getting things cleared ;-)

> Q2)
>
> Is it efficient to partition data first and then sort records inside each
> partition?
>
> Does it happen to avoid comparing expensive pair-wise key comparisons?

Typically you would only want sorting done inside a partitioned set,
since all of the different partitions are sent off to different
reducers. Total-order partitioning may be an exception here, perhaps.

> Q3)
>
> Are there any documents containing explanations about how such internal
> classes are implemented?

There's a very good presentation you may want to see, on the
spill/shuffle/sort framework portions your doubts are about:
http://www.slideshare.net/hadoopusergroup/ordered-record-collection

HTH :)

-- 
Harsh J
www.harshj.com

Re: About MapTask.java

Reply via email to