Thank you, and I think this might hold also the answer for my question: in "hadoop-default.xml" :
<property> <name>map.sort.class</name> <value>org.apache.hadoop.util.QuickSort</value> <description>The default sort class for sorting keys.</description> </property> On Wed, Apr 28, 2010 at 3:27 PM, Patrick Angeles <[email protected]>wrote: > Dan, > > Shuffle and Sort is a combination of multiple 'algorithms'. > > - Map output goes to a circular, in-memory buffer > - When this starts filling up, it gets 'spilled' to disk > - Spilling involves writing each K/V pair to a partition specific file > (where partition is the algorithm Jim describes below) in order sorted by K > - You may get multiple files per partition (you get a file per-partition > every time a spill happens) > - In which case, the spill files get merge sorted into larger files > - Reducers pick up the final merged files from multiple mappers > - Reducers may pick up multiple of these files from several mappers > - The reducer may perform a final merge before starting the reduce phase > > > > On Wed, Apr 28, 2010 at 2:47 PM, Jim Twensky <[email protected]> > wrote: > > > I'm not quite sure what you mean by the Shuffle algorithm but briefly > > for each (key,value) pair, a hash value is computed and then the > > record is sent to a node by taking the modulo of that hash. So if you > > have n reducers, the record goes to reducer # : hash(key) % n. > > > > The sorting algorithm is most probably a variation of the external > > sorting algorithm used for sorting objects that don't fit in the > > memory. See: > > > > http://en.wikipedia.org/wiki/External_sorting > > > > for more details. > > > > Jim > > > > On Wed, Apr 28, 2010 at 1:16 PM, Dan Fundatureanu > > <[email protected]> wrote: > > > What is the algorithm used in "Shuffle and Sort" step? > > > > > >
