[ 
https://issues.apache.org/jira/browse/HADOOP-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716222#action_12716222
 ] 

Sharad Agarwal commented on HADOOP-5931:
----------------------------------------

To collect stats for last hour/day, we can have a moving window for that time 
period. A moving window can contain multiple time slots. The granularity of 
window movement/update is decided by the slot size. The slot size could be 
different for different time windows. For example, hour window could have 5 
minutes, day window could have 1 hour update granularity. So in that case hour 
window would hold stats in 12 slots of 5 mins each. Likewise day window would 
hold stats in 24 slots of 1 hour each.

As the last slot time is crossed, a new slot would be added and the very first 
one would be knocked off. Hence moving the window by one slot.

A simple strategy could be to collect this information in TaskTracker and 
report that to JobTracker via TaskTrackerStatus. A subclass could be added to 
TaskTrackerStatus with fields, say:
tasksSinceStarted, tasksSuccededSinceStarted,
tasksSinceInLastHour, tasksSuccededInLastHour,
tasksSinceInLastDay, tasksSuccededInLastDay

To optimize on heartbeat size, we need not send the above fields with every 
heartbeat. This could be reported only at certain interval (typically the 
minimum slot size, 5 mins in above example).

An alternate way could be to compute all this in JobTracker. My vote goes for 
doing it in Tasktracker as this is mostly to do with individual Task tracker 
and doesn't need any global information.

Thoughts?


> Collect information about number of tasks succeeded / total per time unit for 
> a tasktracker. 
> ---------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5931
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5931
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Hemanth Yamijala
>
> Collecting information of number of tasks succeeded / total per tasktracker 
> and being able to see these counts per hour, day and since start time will 
> help reason about things like the blacklisting strategy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to