[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864065#action_12864065
 ] 

Scott Chen commented on MAPREDUCE-220:
--------------------------------------

Hey guys, Thanks for the help.

I am not familiar with the counters. But from Arun and Vinod's comments I can 
the see the benefits:
1. Reuse of the counter logging and transmitting
2. Easier to expose to end users
This is really good!

But as Dhruba mentioned, we want to use this information for scheduling.
So measuring it and then sending it with the heart beat ensures the scheduler 
gets the latest information.
One minute may be too slow for the scheduling.

The other question I have is that 
Using counters, can we aggregate using other method (e.g. max) rather than just 
increment values?

My original plan is to report these information in this issue and aggregate 
them into job level status in MAPREDUCE-1739.
And I am planning to generate these fields after aggregation:
1. Total CPU cycles (# of giga-cycles)
2. Total Memory occupied time (GB-sec)
3. Maximum peak memory on one task (GB)
4. Maximum peak CPU on one task (GHz)
Is it possible to get these fields by using the counters?

I will read the relavent codes and think more about it.
Maybe there's a way to obtain both benefit.

Vinod: I also feel that there are lots of redundant creation/computation of 
processTree.
Maybe we should refactor the codes and use one thread to compute it and expose 
the information to others.



> Collecting cpu and memory usage for MapReduce tasks
> ---------------------------------------------------
>
>                 Key: MAPREDUCE-220
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: task, tasktracker
>            Reporter: Hong Tang
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt
>
>
> It would be nice for TaskTracker to collect cpu and memory usage for 
> individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to