[
https://issues.apache.org/jira/browse/MAPREDUCE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429562#comment-13429562
]
Todd Lipcon commented on MAPREDUCE-4469:
----------------------------------------
The problem with only doing it at the end is that jobs like streaming then
won't account their child processes. At the end of the task, the streaming
child process already exited.
If we only want to count our own usage (and not our child hierarchy), then we
could avoid this whole complication and just read stats from /proc/self/stat.
> Resource calculation in child tasks is CPU-heavy
> ------------------------------------------------
>
> Key: MAPREDUCE-4469
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4469
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: performance, task
> Affects Versions: 1.0.3
> Reporter: Todd Lipcon
> Assignee: Ahmed Radwan
> Attachments: MAPREDUCE-4469.patch
>
>
> In doing some benchmarking on a hadoop-1 derived codebase, I noticed that
> each of the child tasks was doing a ton of syscalls. Upon stracing, I noticed
> that it's spending a lot of time looping through all the files in /proc to
> calculate resource usage.
> As a test, I added a flag to disable use of the ResourceCalculatorPlugin
> within the tasks. On a CPU-bound 500G-sort workload, this improved total job
> runtime by about 10% (map slot-seconds by 14%, reduce slot seconds by 8%)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira