[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

M. C. Srivas (JIRA) Mon, 12 Jul 2010 23:49:57 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12887672#action_12887672
 ]


M. C. Srivas commented on MAPREDUCE-220:
----------------------------------------

We've found that disk bandwidth is virtually unlimited compared to other 
factors, esp network, thus measuring/collecting it is not worthwhile for 
scheduling. More interesting is disk-ops-per-second-per-drive. It identifies  
bad data layout immediately (ie, one disk will be very hot even though it might 
be transferring very little data).

Unfortunately, using ops / second / disk  to schedule work is still not very 
useful, since bad data layout will not change because we schedule less.

Network is a big bottleneck. But bytes-in/bytes-out per unit of time is not 
representative of a problem. IF we had some measure of the congestion, we could 
use it to increase/decrease scheduling locality (eg, if network gets congested, 
reduce %-age of non-local tasks).  We need to know round-trip times under 
"normal" vs "congested" situations., dropped packet counts, retransmit counts, 
etc. to figure out metrics for congestion. (Perhaps add some sockopts to tell 
us this? TCP knows this, after all)

CPU/memory/swapping still seem to be most useful therefore.




> Collecting cpu and memory usage for MapReduce tasks
> ---------------------------------------------------
>
>                 Key: MAPREDUCE-220
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: task, tasktracker
>            Reporter: Hong Tang
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-v1.txt, 
> MAPREDUCE-220.txt
>
>
> It would be nice for TaskTracker to collect cpu and memory usage for 
> individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

Reply via email to