On Mar 2, 2008, at 12:53 PM, momina khan wrote:
i have trouble comprehending what shuffle phase is exactly ... can anyone plz exlpain in for me.... and also point out the name of the class that the thread for shuffle runs and also the class spawning the thread itself!
The shuffle phase is the data motion from the map output to the reduce input. In general, it involves each reduce collecting outputs from each map, which is why it is called the "shuffle". The TaskTracker where the map ran has a jetty server that gives out the map outputs. The ReduceTask copies the map outputs as they finish. You can look at ReduceTask.java for the client side of the shuffle.
-- Owen
