[ 
https://issues.apache.org/jira/browse/TEZ-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-2565:
----------------------------------
    Attachment: TEZ-2565.4.patch

- Simplified constructStatistics by returning just completedTasksStatsCache. 
All changes to completedTasksStatsCache are done in TaskCompletedTransition & 
TaskRescheduledTransition. This should take care thread safety issues. 
- Agreed that this is an incompatible change as it returns only completed task 
details.  Have documented in API and will add it to CHANGES.txt during commit.
- Improvement is mainly on reducing GC time. Earlier for every call to 
constructStats(), hashmap was getting created and was iterating through 
running/completed stats. Since the number of invocations was way high in 
certain cases, this showed up as hotspot. Now it populates only completed task 
stats and not much overhead in constructStats. 

> Consider scanning unfinished tasks in VertexImpl::constructStatistics to 
> reduce merge overhead
> ----------------------------------------------------------------------------------------------
>
>                 Key: TEZ-2565
>                 URL: https://issues.apache.org/jira/browse/TEZ-2565
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>         Attachments: TEZ-2565.1.patch, TEZ-2565.2.patch, TEZ-2565.3.patch, 
> TEZ-2565.4.patch, cpu_usage_with_patch.png, cpu_usage_without_patch.png, 
> mem_usage_with_patch.png, mem_usage_without_patch.png
>
>
> constructStatistics() can be an overhead (scanning all tasks and merging 
> stats) depending on the number of invocations to Vertex::getStatistics().  
> Consider scanning only unfinished tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to