Hey Sean,

Check out http://www.slideshare.net/jhammerb/hadoop-map-reduce-arch-106883,
a slightly dated and MR1-oriented presentation from Owen O'Malley that
goes a good level in-depth to get an overview of how things work
(including how reduces pull data).

After that, check out Chris Douglas'
http://www.slideshare.net/hadoopusergroup/ordered-record-collection
that goes in-depth into the evolution of the implementations of that
layer. This is pretty much the state of 0.20/1.0 today too, and in 2.0
we have had Netty replacing Jetty among other improvements but I
haven't a public document link to share on this yet. Others may share
the changes docs on 2.0 if they have a link to one (or I'll respond
back as soon as I have one).

I hope this helps!

On Wed, Jun 6, 2012 at 4:16 AM, Barry, Sean F <sean.f.ba...@intel.com> wrote:
> "I was always wondering after mapping, how each reduce task get its input. It 
> is said in
> google's paper and hadoop's documentation that a sort is done to aggregate the
> same key of the map output. But there is no detailed explanation of how it is
> implemented and my intuition is that perhaps a global hashing will work better
> than sorting. So I really want to know the details and see whether my 
> intuition
> is right. If I can find out that in the source code, where should I start 
> with?"
>
> I saw this question online and no one replied to it. does anyone know where I 
> go to study the source code for the shuffle and sort.
>
> -sean



-- 
Harsh J

Reply via email to