In case of HADOOP the reducers can start along with maps because the
shuffle phase can start and pull map outputs whenever it can. This
overlaps the map phase and shuffle phase. The actual reduce happens only after
all the maps have completed and the map output meant for the reduce is
sorted. So even in case of HADOOP the reduce function is applied only
after all the maps finish. But the reducers start in parallel just for
shuffling.
Amar
On Mon, 3 Mar 2008, momina khan wrote:
hi all,
as seen in the video lectures from google their map reduce ensures
that all maps finish before reduces begin ... their reason for
ensuring this is that not all reduce functions are not necessarily
idempotent....
i just wanted to confirm whether hadoop too follows the same
philosophy ? do all maps end and then reduces begin or can they go on
in parallel cause that is the impression you get from the hadoop code!
cheers
momina