[jira] Created: (HADOOP-1950) Job Tracker needs to collect more job/task execution stats and save them to DFS file

Runping Qi (JIRA) Wed, 26 Sep 2007 11:30:13 -0700

Job Tracker needs to collect more job/task execution stats and save them to DFS 
file
------------------------------------------------------------------------------------


                 Key: HADOOP-1950
                 URL: https://issues.apache.org/jira/browse/HADOOP-1950
             Project: Hadoop
          Issue Type: New Feature
            Reporter: Runping Qi



In order to facilitate offline analysis on the dynamic behaviors and 
performance characterics of map/reduce jobs, 
we need the job tracker to collect some data about jobs and save them to DFS 
files. Some data are  in time series form, 
and some are not.
Below is a preliminary list of desired data. Some of them are already available 
in the current job trackers. Some are new.

For each map/reduce job, we need the following non time series data:
   1. jobid, jobname,  number of mappers, number of reducers, start time, end 
time, end of mapper phase
   2. Average (median, min, max) of successful mapper execution time, 
input/output records/bytes
   3. Average (median, min, max) of uncessful mapper execution time, 
input/output records/bytes
   4.Total mapper retries,  max, average number of re-tries per mapper
   5. The reasons for mapper task fails.

   6. Average (median, min, max) of successful reducer execution time, 
input/output reocrds/bytes
           Execution time is the difference between the sort end time and the 
task end time
   7. Average (median, min, max) of successful copy time (from the mapper phase 
end time  to the sort start time).
   8. Average (median, min, max) of successful sorting time for successful 
reducers

   9. Average (median, min, max) of unsuccessful reducer execution time (from 
the end of mapper phase or the start of the task, 
       whichever later, to the end of task)
   10. Total reducer retries,  max, average number of per reducer retries
   11. The reasons for reducer task fails (user code error, lost tracker, 
failed to write to DFS, etc.)

For each map/reduce job, we collect the following  time series data (with one 
minute interval):

    1. Numbers of pending mappers, reducers
    2. Number of running mappers, reducers

For the job tracker, we need the following data:

    1. Number of trackers 
    2. Start time 
    3. End time 
    4. The list of map reduce jobs (their ids, starttime/endtime)
    
The following time series data (with one minute interval):
    1. The number of running jobs
    2. The numbers of running mappers/reducers
    3. The number pending mappers/reducers 


The data collection should be optional. That is, a job tracker can turn off 
such data collection, and 
in that case, it should not pay the cost.

The job tracker should organize the in memory version of the collected data in 
such a way that:
1. it does not consume excessive amount of memory
2. the data may be suitable for presenting through the Web status pages.

The data saved on DFS files should be in hadoop record format.




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HADOOP-1950) Job Tracker needs to collect more job/task execution stats and save them to DFS file

Reply via email to