[
https://issues.apache.org/jira/browse/HADOOP-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12530852
]
Runping Qi commented on HADOOP-1950:
------------------------------------
Just realized that hadoop job tracker already creates one history file per
map/reduce job.
Most of the data this Jira requested can be re-generated from there.
> Job Tracker needs to collect more job/task execution stats and save them to
> DFS file
> ------------------------------------------------------------------------------------
>
> Key: HADOOP-1950
> URL: https://issues.apache.org/jira/browse/HADOOP-1950
> Project: Hadoop
> Issue Type: New Feature
> Reporter: Runping Qi
>
> In order to facilitate offline analysis on the dynamic behaviors and
> performance characterics of map/reduce jobs,
> we need the job tracker to collect some data about jobs and save them to DFS
> files. Some data are in time series form,
> and some are not.
> Below is a preliminary list of desired data. Some of them are already
> available in the current job trackers. Some are new.
> For each map/reduce job, we need the following non time series data:
> 1. jobid, jobname, number of mappers, number of reducers, start time, end
> time, end of mapper phase
> 2. Average (median, min, max) of successful mapper execution time,
> input/output records/bytes
> 3. Average (median, min, max) of uncessful mapper execution time,
> input/output records/bytes
> 4.Total mapper retries, max, average number of re-tries per mapper
> 5. The reasons for mapper task fails.
> 6. Average (median, min, max) of successful reducer execution time,
> input/output reocrds/bytes
> Execution time is the difference between the sort end time and the
> task end time
> 7. Average (median, min, max) of successful copy time (from the mapper
> phase end time to the sort start time).
> 8. Average (median, min, max) of successful sorting time for successful
> reducers
> 9. Average (median, min, max) of unsuccessful reducer execution time (from
> the end of mapper phase or the start of the task,
> whichever later, to the end of task)
> 10. Total reducer retries, max, average number of per reducer retries
> 11. The reasons for reducer task fails (user code error, lost tracker,
> failed to write to DFS, etc.)
> For each map/reduce job, we collect the following time series data (with one
> minute interval):
> 1. Numbers of pending mappers, reducers
> 2. Number of running mappers, reducers
> For the job tracker, we need the following data:
> 1. Number of trackers
> 2. Start time
> 3. End time
> 4. The list of map reduce jobs (their ids, starttime/endtime)
>
> The following time series data (with one minute interval):
> 1. The number of running jobs
> 2. The numbers of running mappers/reducers
> 3. The number pending mappers/reducers
> The data collection should be optional. That is, a job tracker can turn off
> such data collection, and
> in that case, it should not pay the cost.
> The job tracker should organize the in memory version of the collected data
> in such a way that:
> 1. it does not consume excessive amount of memory
> 2. the data may be suitable for presenting through the Web status pages.
> The data saved on DFS files should be in hadoop record format.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.