[jira] [Updated] (TEZ-1074) DAGAppMaster takes lots of CPU when running a reasonably large job in the cluster

Rajesh Balamohan (JIRA) Sun, 20 Apr 2014 09:19:18 -0700

     [ 
https://issues.apache.org/jira/browse/TEZ-1074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Rajesh Balamohan updated TEZ-1074:
----------------------------------

    Description: 
- Ran a job which used 200 containers.
- DAGAppMaster was running at 70% CPU most of the time during the job.
- Profiling revealed that lots of time was spent on TezEvent.readFields --> ... 
--> TaskStatusUpdateEvent.readFields().
- Default "tez.task.am.heartbeat.interval-ms.max=100" ms.  With 200 containers, 
potentially 2000 events (these events have TezCounters) per second would be 
processed by DAGAppMaster.

With large job, cpu usage of DAGAppMaster can bloat up significantly.  

One option to reduce CPU usage could be to send modified TezCounters in 
TezStatusUpdateEvent.

  was:
- Ran a job which used 200 containers.
- DAGAppMaster was running at 70% CPU most of the time during the job.
- Profiling revealed that lots of time was spent on TezEvent.readFields --> ... 
--> TaskStatusUpdateEvent.readFields().
- Default "tez.task.am.heartbeat.interval-ms.max=100" ms.  With 200 containers, 
potentially 2000 events (these events have TezCounters) per second would be 
processed by DAGAppMaster.

With large job, cpu usage can bloat up significantly.  

One option to reduce CPU usage could be to send modified TezCounters in 
TezStatusUpdateEvent.


> DAGAppMaster takes lots of CPU when running a reasonably large job in the 
> cluster
> ---------------------------------------------------------------------------------
>
>                 Key: TEZ-1074
>                 URL: https://issues.apache.org/jira/browse/TEZ-1074
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Rajesh Balamohan
>         Attachments: Screen Shot 2014-04-19 at 7.26.36 PM.png
>
>
> - Ran a job which used 200 containers.
> - DAGAppMaster was running at 70% CPU most of the time during the job.
> - Profiling revealed that lots of time was spent on TezEvent.readFields --> 
> ... --> TaskStatusUpdateEvent.readFields().
> - Default "tez.task.am.heartbeat.interval-ms.max=100" ms.  With 200 
> containers, potentially 2000 events (these events have TezCounters) per 
> second would be processed by DAGAppMaster.
> With large job, cpu usage of DAGAppMaster can bloat up significantly.  
> One option to reduce CPU usage could be to send modified TezCounters in 
> TezStatusUpdateEvent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (TEZ-1074) DAGAppMaster takes lots of CPU when running a reasonably large job in the cluster

Reply via email to