Job Tracker needs to collect more job/task execution stats and save them to DFS
file
------------------------------------------------------------------------------------
Key: HADOOP-1950
URL: https://issues.apache.org/jira/browse/HADOOP-1950
Project: Hadoop
Issue Type: New Feature
Reporter: Runping Qi
In order to facilitate offline analysis on the dynamic behaviors and
performance characterics of map/reduce jobs,
we need the job tracker to collect some data about jobs and save them to DFS
files. Some data are in time series form,
and some are not.
Below is a preliminary list of desired data. Some of them are already available
in the current job trackers. Some are new.
For each map/reduce job, we need the following non time series data:
1. jobid, jobname, number of mappers, number of reducers, start time, end
time, end of mapper phase
2. Average (median, min, max) of successful mapper execution time,
input/output records/bytes
3. Average (median, min, max) of uncessful mapper execution time,
input/output records/bytes
4.Total mapper retries, max, average number of re-tries per mapper
5. The reasons for mapper task fails.
6. Average (median, min, max) of successful reducer execution time,
input/output reocrds/bytes
Execution time is the difference between the sort end time and the
task end time
7. Average (median, min, max) of successful copy time (from the mapper phase
end time to the sort start time).
8. Average (median, min, max) of successful sorting time for successful
reducers
9. Average (median, min, max) of unsuccessful reducer execution time (from
the end of mapper phase or the start of the task,
whichever later, to the end of task)
10. Total reducer retries, max, average number of per reducer retries
11. The reasons for reducer task fails (user code error, lost tracker,
failed to write to DFS, etc.)
For each map/reduce job, we collect the following time series data (with one
minute interval):
1. Numbers of pending mappers, reducers
2. Number of running mappers, reducers
For the job tracker, we need the following data:
1. Number of trackers
2. Start time
3. End time
4. The list of map reduce jobs (their ids, starttime/endtime)
The following time series data (with one minute interval):
1. The number of running jobs
2. The numbers of running mappers/reducers
3. The number pending mappers/reducers
The data collection should be optional. That is, a job tracker can turn off
such data collection, and
in that case, it should not pay the cost.
The job tracker should organize the in memory version of the collected data in
such a way that:
1. it does not consume excessive amount of memory
2. the data may be suitable for presenting through the Web status pages.
The data saved on DFS files should be in hadoop record format.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.