[jira] [Commented] (MAPREDUCE-4469) Resource calculation in child tasks is CPU-heavy

Ahmed Radwan (JIRA) Wed, 08 Aug 2012 15:27:23 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431462#comment-13431462
 ]


Ahmed Radwan commented on MAPREDUCE-4469:
-----------------------------------------

Many thanks Todd! I agree, I looked more into how these values are updated, I 
thought the streaming process is still accounted for because of the cumulative 
nature of how these values are calculated. For example, in 
getCumulativeCpuTime():

{code}
    cpuTime += incJiffies * JIFFY_LENGTH_IN_MILLIS;
    return cpuTime;
{code}

But seems that the pTree and its values are only updated when 
getProcResourceValues() is called, and it is only called from initialize() and 
updateResourceCounters() in the Task.

So Basically any resource changes, between two calls of 
getProcResourceValues(), won't be accounted for.

Since this overhead is happening with every update from the task, what if we 
add a new configuration property that defines a number of update skips before 
updating the resource counters. For example, the resource counters will be only 
updated every 10 updates (by default), but the user can still configure the 
resolution of these updates through this configuration property. What do you 
think?
                
> Resource calculation in child tasks is CPU-heavy
> ------------------------------------------------
>
>                 Key: MAPREDUCE-4469
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4469
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: performance, task
>    Affects Versions: 1.0.3
>            Reporter: Todd Lipcon
>            Assignee: Ahmed Radwan
>         Attachments: MAPREDUCE-4469.patch
>
>
> In doing some benchmarking on a hadoop-1 derived codebase, I noticed that 
> each of the child tasks was doing a ton of syscalls. Upon stracing, I noticed 
> that it's spending a lot of time looping through all the files in /proc to 
> calculate resource usage.
> As a test, I added a flag to disable use of the ResourceCalculatorPlugin 
> within the tasks. On a CPU-bound 500G-sort workload, this improved total job 
> runtime by about 10% (map slot-seconds by 14%, reduce slot seconds by 8%)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4469) Resource calculation in child tasks is CPU-heavy

Reply via email to