Agreed. At least, I believe the new web-ui for MRv2 is (or will be soon) more verbose about this.
On Sep 18, 2011, at 9:23 PM, Kai Voigt wrote: > Hi, > > this 0-33-66-100% phases are really confusing to beginners. We see that in > our training classes. The output should be more verbose, such as breaking > down the phases into seperate progress numbers. > > Does that make sense? > > Am 19.09.2011 um 06:17 schrieb Arun C Murthy: > >> Nan, >> >> The 'phase' is implicitly understood by the 'progress' (value) made by the >> map/reduce tasks (see o.a.h.mapred.TaskStatus.Phase). >> >> For e.g. >> Reduce: >> 0-33% -> Shuffle >> 34-66% -> Sort (actually, just 'merge', there is no sort in the reduce since >> all map-outputs are sorted) >> 67-100% -> Reduce >> >> With 0.23 onwards the Map has phases too: >> 0-90% -> Map >> 91-100% -> Final Sort/merge >> >> Now,about starting reduces early - this is done to ensure shuffle can >> proceed for completed maps while rest of the maps run, there-by pipelining >> shuffle and map completion. There is a 'reduce slowstart' feature to control >> this - by default, reduces aren't started until 5% of maps are complete. >> Users can set this higher. >> >> Arun >> >> On Sep 18, 2011, at 7:24 PM, Nan Zhu wrote: >> >>> Hi, all >>> >>> recently, I was hit by a question, "how is a hadoop job divided into 2 >>> phases?", >>> >>> In textbooks, we are told that the mapreduce jobs are divided into 2 phases, >>> map and reduce, and for reduce, we further divided it into 3 stages, >>> shuffle, sort, and reduce, but in hadoop codes, I never think about >>> this question, I didn't see any variable members in JobInProgress class >>> to indicate this information, >>> >>> and according to my understanding on the source code of hadoop, the reduce >>> tasks are unnecessarily started until all mappers are finished, in >>> constract, we can see the reduce tasks are in shuffle stage while there are >>> mappers which are still in running, >>> So how can I indicate the phase which the job is belonging to? >>> >>> Thanks >>> -- >>> Nan Zhu >>> School of Electronic, Information and Electrical Engineering,229 >>> Shanghai Jiao Tong University >>> 800,Dongchuan Road,Shanghai,China >>> E-Mail: [email protected] >> >> > > -- > Kai Voigt > [email protected] > > > >
