[ 
https://issues.apache.org/jira/browse/HADOOP-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12530852
 ] 

Runping Qi commented on HADOOP-1950:
------------------------------------

Just realized that hadoop job tracker already creates one history file per 
map/reduce job.
Most of the data this Jira requested can be re-generated from there.


> Job Tracker needs to collect more job/task execution stats and save them to 
> DFS file
> ------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1950
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1950
>             Project: Hadoop
>          Issue Type: New Feature
>            Reporter: Runping Qi
>
> In order to facilitate offline analysis on the dynamic behaviors and 
> performance characterics of map/reduce jobs, 
> we need the job tracker to collect some data about jobs and save them to DFS 
> files. Some data are  in time series form, 
> and some are not.
> Below is a preliminary list of desired data. Some of them are already 
> available in the current job trackers. Some are new.
> For each map/reduce job, we need the following non time series data:
>    1. jobid, jobname,  number of mappers, number of reducers, start time, end 
> time, end of mapper phase
>    2. Average (median, min, max) of successful mapper execution time, 
> input/output records/bytes
>    3. Average (median, min, max) of uncessful mapper execution time, 
> input/output records/bytes
>    4.Total mapper retries,  max, average number of re-tries per mapper
>    5. The reasons for mapper task fails.
>    6. Average (median, min, max) of successful reducer execution time, 
> input/output reocrds/bytes
>            Execution time is the difference between the sort end time and the 
> task end time
>    7. Average (median, min, max) of successful copy time (from the mapper 
> phase end time  to the sort start time).
>    8. Average (median, min, max) of successful sorting time for successful 
> reducers
>    9. Average (median, min, max) of unsuccessful reducer execution time (from 
> the end of mapper phase or the start of the task, 
>        whichever later, to the end of task)
>    10. Total reducer retries,  max, average number of per reducer retries
>    11. The reasons for reducer task fails (user code error, lost tracker, 
> failed to write to DFS, etc.)
> For each map/reduce job, we collect the following  time series data (with one 
> minute interval):
>     1. Numbers of pending mappers, reducers
>     2. Number of running mappers, reducers
> For the job tracker, we need the following data:
>     1. Number of trackers 
>     2. Start time 
>     3. End time 
>     4. The list of map reduce jobs (their ids, starttime/endtime)
>     
> The following time series data (with one minute interval):
>     1. The number of running jobs
>     2. The numbers of running mappers/reducers
>     3. The number pending mappers/reducers 
> The data collection should be optional. That is, a job tracker can turn off 
> such data collection, and 
> in that case, it should not pay the cost.
> The job tracker should organize the in memory version of the collected data 
> in such a way that:
> 1. it does not consume excessive amount of memory
> 2. the data may be suitable for presenting through the Web status pages.
> The data saved on DFS files should be in hadoop record format.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to