Re: Shuffle/sort

Harsh J Wed, 06 Jun 2012 09:39:48 -0700

No (sorry if I confused) the outputs are pulled from TaskTrackers'
HTTP server, which access the local (mapred.local.dir) file outputs
from maps, and serve it to the requester (reduce process). There is no
'push' in MR in this phase.


On Wed, Jun 6, 2012 at 10:06 PM, Barry, Sean F <sean.f.ba...@intel.com> wrote:
> So I'm assuming that there is a push side also? Is it part of the map output?
> -sb
>
> -----Original Message-----
> From: Harsh J [mailto:ha...@cloudera.com]
> Sent: Wednesday, June 06, 2012 9:33 AM
> To: common-user@hadoop.apache.org
> Subject: Re: Shuffle/sort
>
> Sean,
>
> Yes thats the one for the shuffles that happen on reduce side (pull model), 
> you can drill down from that class onwards into seeing how fetchers operate, 
> etc.
>
> On Wed, Jun 6, 2012 at 9:54 PM, Barry, Sean F <sean.f.ba...@intel.com> wrote:
>> Thanks Harsh!
>> And is this the right source code for the shuffling that is done in the 
>> reduce task?
>>
>> http://search-hadoop.com/c/Hadoop:/hadoop-mapreduce-project/hadoop-map
>> reduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/ha
>> doop/mapreduce/task/reduce/Shuffle.java%7C%7Cshuffle+sort
>>
>> -sb
>>
>> -----Original Message-----
>> From: Harsh J [mailto:ha...@cloudera.com]
>> Sent: Tuesday, June 05, 2012 7:43 PM
>> To: common-user@hadoop.apache.org
>> Subject: Re: Shuffle/sort
>>
>> Hey Sean,
>>
>> Check out
>> http://www.slideshare.net/jhammerb/hadoop-map-reduce-arch-106883,
>> a slightly dated and MR1-oriented presentation from Owen O'Malley that goes 
>> a good level in-depth to get an overview of how things work (including how 
>> reduces pull data).
>>
>> After that, check out Chris Douglas'
>> http://www.slideshare.net/hadoopusergroup/ordered-record-collection
>> that goes in-depth into the evolution of the implementations of that layer. 
>> This is pretty much the state of 0.20/1.0 today too, and in 2.0 we have had 
>> Netty replacing Jetty among other improvements but I haven't a public 
>> document link to share on this yet. Others may share the changes docs on 2.0 
>> if they have a link to one (or I'll respond back as soon as I have one).
>>
>> I hope this helps!
>>
>> On Wed, Jun 6, 2012 at 4:16 AM, Barry, Sean F <sean.f.ba...@intel.com> wrote:
>>> "I was always wondering after mapping, how each reduce task get its
>>> input. It is said in google's paper and hadoop's documentation that a
>>> sort is done to aggregate the same key of the map output. But there
>>> is no detailed explanation of how it is implemented and my intuition
>>> is that perhaps a global hashing will work better than sorting. So I
>>> really want to know the details and see whether my intuition is right. If I 
>>> can find out that in the source code, where should I start with?"
>>>
>>> I saw this question online and no one replied to it. does anyone know where 
>>> I go to study the source code for the shuffle and sort.
>>>
>>> -sean
>>
>>
>>
>> --
>> Harsh J
>
>
>
> --
> Harsh J



-- 
Harsh J

Re: Shuffle/sort

Reply via email to