I'm not quite sure what you mean by the Shuffle algorithm but briefly for each (key,value) pair, a hash value is computed and then the record is sent to a node by taking the modulo of that hash. So if you have n reducers, the record goes to reducer # : hash(key) % n.
The sorting algorithm is most probably a variation of the external sorting algorithm used for sorting objects that don't fit in the memory. See: http://en.wikipedia.org/wiki/External_sorting for more details. Jim On Wed, Apr 28, 2010 at 1:16 PM, Dan Fundatureanu <[email protected]> wrote: > What is the algorithm used in "Shuffle and Sort" step? >
