[
https://issues.apache.org/jira/browse/TEZ-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599784#comment-14599784
]
Siddharth Seth commented on TEZ-2565:
-------------------------------------
- I think this is missing the actual increment of the stats in the
TaskCompleteTransition - even though it's marking the task as aggregated in
completedStats ?
- In constructStats - there's no check on whether the task being aggregated has
already been set in the current copy of completedTaskStatistics. This build
should only be required if completedTaskStats is null - since task completion
updates the stats for a completed task.
- For the null check to work, RescheduledTransition should set taskStatistics
to null instead of creating a new object.
Since completedTasksStatsCache is being returned in getStatistics - will have
to be careful about how the structure in OutputStatistics gets modified in
TEZ-2496. For the current fields it seems to be safe (via volatile longs). It
is possible for a read to happen on the OuputStats instance while it's being
updated by another task completing.
Thread safety of getStatistics - especially w.r.t construction of getStatistics
would need to be docced.
Otherwise, this looks good to me.
> Consider scanning unfinished tasks in VertexImpl::constructStatistics to
> reduce merge overhead
> ----------------------------------------------------------------------------------------------
>
> Key: TEZ-2565
> URL: https://issues.apache.org/jira/browse/TEZ-2565
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Rajesh Balamohan
> Assignee: Rajesh Balamohan
> Attachments: TEZ-2565.1.patch, TEZ-2565.2.patch, TEZ-2565.3.patch,
> cpu_usage_with_patch.png, cpu_usage_without_patch.png,
> mem_usage_with_patch.png, mem_usage_without_patch.png
>
>
> constructStatistics() can be an overhead (scanning all tasks and merging
> stats) depending on the number of invocations to Vertex::getStatistics().
> Consider scanning only unfinished tasks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)