[
https://issues.apache.org/jira/browse/TEZ-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rajesh Balamohan updated TEZ-2565:
----------------------------------
Attachment: TEZ-2565.4.patch
- Simplified constructStatistics by returning just completedTasksStatsCache.
All changes to completedTasksStatsCache are done in TaskCompletedTransition &
TaskRescheduledTransition. This should take care thread safety issues.
- Agreed that this is an incompatible change as it returns only completed task
details. Have documented in API and will add it to CHANGES.txt during commit.
- Improvement is mainly on reducing GC time. Earlier for every call to
constructStats(), hashmap was getting created and was iterating through
running/completed stats. Since the number of invocations was way high in
certain cases, this showed up as hotspot. Now it populates only completed task
stats and not much overhead in constructStats.
> Consider scanning unfinished tasks in VertexImpl::constructStatistics to
> reduce merge overhead
> ----------------------------------------------------------------------------------------------
>
> Key: TEZ-2565
> URL: https://issues.apache.org/jira/browse/TEZ-2565
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Rajesh Balamohan
> Assignee: Rajesh Balamohan
> Attachments: TEZ-2565.1.patch, TEZ-2565.2.patch, TEZ-2565.3.patch,
> TEZ-2565.4.patch, cpu_usage_with_patch.png, cpu_usage_without_patch.png,
> mem_usage_with_patch.png, mem_usage_without_patch.png
>
>
> constructStatistics() can be an overhead (scanning all tasks and merging
> stats) depending on the number of invocations to Vertex::getStatistics().
> Consider scanning only unfinished tasks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)