Hi, all recently, I was hit by a question, "how is a hadoop job divided into 2 phases?",
In textbooks, we are told that the mapreduce jobs are divided into 2 phases, map and reduce, and for reduce, we further divided it into 3 stages, shuffle, sort, and reduce, but in hadoop codes, I never think about this question, I didn't see any variable members in JobInProgress class to indicate this information, and according to my understanding on the source code of hadoop, the reduce tasks are unnecessarily started until all mappers are finished, in constract, we can see the reduce tasks are in shuffle stage while there are mappers which are still in running, So how can I indicate the phase which the job is belonging to? Thanks -- Nan Zhu School of Electronic, Information and Electrical Engineering,229 Shanghai Jiao Tong University 800,Dongchuan Road,Shanghai,China E-Mail: [email protected]
