[ https://issues.apache.org/jira/browse/TEZ-1074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13978847#comment-13978847 ]
Rajesh Balamohan commented on TEZ-1074: --------------------------------------- If we send only the updated counters, we need to iterate through and merge all the counters back in the STATUS_UPDATER (expensive op). Otherwise AM would end up having half cooked set of counters. Since we send counters every second, i have removed the counter update portion in the latest patch. > DAGAppMaster takes lots of CPU when running a reasonably large job in the > cluster > --------------------------------------------------------------------------------- > > Key: TEZ-1074 > URL: https://issues.apache.org/jira/browse/TEZ-1074 > Project: Apache Tez > Issue Type: Bug > Reporter: Rajesh Balamohan > Attachments: Screen Shot 2014-04-19 at 7.26.36 PM.png, > TEZ-1074-v1.patch, TEZ-1074-v2.patch, TEZ-1074-v7.patch, TEZ-1074-v8.patch > > > - Ran a job which used 200 containers. > - DAGAppMaster was running at 70% CPU most of the time during the job. > - Profiling revealed that lots of time was spent on TezEvent.readFields --> > ... --> TaskStatusUpdateEvent.readFields(). > - Default "tez.task.am.heartbeat.interval-ms.max=100" ms. With 200 > containers, potentially 2000 events (these events have TezCounters) per > second would be processed by DAGAppMaster. > With large job, cpu usage of DAGAppMaster can bloat up significantly. > One option to reduce CPU usage could be to send modified TezCounters in > TezStatusUpdateEvent. -- This message was sent by Atlassian JIRA (v6.2#6252)