Agreed.

At least, I believe the new web-ui for MRv2 is (or will be soon) more verbose 
about this.

On Sep 18, 2011, at 9:23 PM, Kai Voigt wrote:

> Hi,
> 
> this 0-33-66-100% phases are really confusing to beginners. We see that in 
> our training classes. The output should be more verbose, such as breaking 
> down the phases into seperate progress numbers.
> 
> Does that make sense?
> 
> Am 19.09.2011 um 06:17 schrieb Arun C Murthy:
> 
>> Nan,
>> 
>> The 'phase' is implicitly understood by the 'progress' (value) made by the 
>> map/reduce tasks (see o.a.h.mapred.TaskStatus.Phase).
>> 
>> For e.g. 
>> Reduce: 
>> 0-33% -> Shuffle
>> 34-66% -> Sort (actually, just 'merge', there is no sort in the reduce since 
>> all map-outputs are sorted)
>> 67-100% -> Reduce
>> 
>> With 0.23 onwards the Map has phases too:
>> 0-90% -> Map
>> 91-100% -> Final Sort/merge
>> 
>> Now,about starting reduces early - this is done to ensure shuffle can 
>> proceed for completed maps while rest of the maps run, there-by pipelining 
>> shuffle and map completion. There is a 'reduce slowstart' feature to control 
>> this - by default, reduces aren't started until 5% of maps are complete. 
>> Users can set this higher.
>> 
>> Arun
>> 
>> On Sep 18, 2011, at 7:24 PM, Nan Zhu wrote:
>> 
>>> Hi, all
>>> 
>>> recently, I was hit by a question, "how is a hadoop job divided into 2
>>> phases?",
>>> 
>>> In textbooks, we are told that the mapreduce jobs are divided into 2 phases,
>>> map and reduce, and for reduce, we further divided it into 3 stages,
>>> shuffle, sort, and reduce, but in hadoop codes, I never think about
>>> this question, I didn't see any variable members in JobInProgress class
>>> to indicate this information,
>>> 
>>> and according to my understanding on the source code of hadoop, the reduce
>>> tasks are unnecessarily started until all mappers are finished, in
>>> constract, we can see the reduce tasks are in shuffle stage while there are
>>> mappers which are still in running,
>>> So how can I indicate the phase which the job is belonging to?
>>> 
>>> Thanks
>>> -- 
>>> Nan Zhu
>>> School of Electronic, Information and Electrical Engineering,229
>>> Shanghai Jiao Tong University
>>> 800,Dongchuan Road,Shanghai,China
>>> E-Mail: [email protected]
>> 
>> 
> 
> -- 
> Kai Voigt
> [email protected]
> 
> 
> 
> 

Reply via email to