RE: Shuffle/sort

Barry, Sean F Wed, 06 Jun 2012 09:25:35 -0700

Thanks Harsh!
And is this the right source code for the shuffling that is done in the reduce 
task?

http://search-hadoop.com/c/Hadoop:/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Shuffle.java%7C%7Cshuffle+sort

-sb

-----Original Message-----
From: Harsh J [mailto:ha...@cloudera.com] 
Sent: Tuesday, June 05, 2012 7:43 PM
To: common-user@hadoop.apache.org
Subject: Re: Shuffle/sort

Hey Sean,

Check out http://www.slideshare.net/jhammerb/hadoop-map-reduce-arch-106883,
a slightly dated and MR1-oriented presentation from Owen O'Malley that goes a 
good level in-depth to get an overview of how things work (including how 
reduces pull data).

After that, check out Chris Douglas'
http://www.slideshare.net/hadoopusergroup/ordered-record-collection
that goes in-depth into the evolution of the implementations of that layer. 
This is pretty much the state of 0.20/1.0 today too, and in 2.0 we have had 
Netty replacing Jetty among other improvements but I haven't a public document 
link to share on this yet. Others may share the changes docs on 2.0 if they 
have a link to one (or I'll respond back as soon as I have one).

I hope this helps!

On Wed, Jun 6, 2012 at 4:16 AM, Barry, Sean F <sean.f.ba...@intel.com> wrote:
> "I was always wondering after mapping, how each reduce task get its 
> input. It is said in google's paper and hadoop's documentation that a 
> sort is done to aggregate the same key of the map output. But there is 
> no detailed explanation of how it is implemented and my intuition is 
> that perhaps a global hashing will work better than sorting. So I 
> really want to know the details and see whether my intuition is right. If I 
> can find out that in the source code, where should I start with?"
>
> I saw this question online and no one replied to it. does anyone know where I 
> go to study the source code for the shuffle and sort.
>
> -sean

--
Harsh J

RE: Shuffle/sort

Reply via email to