[
https://issues.apache.org/jira/browse/TEZ-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596998#comment-14596998
]
Bikas Saha commented on TEZ-2565:
---------------------------------
Having clear() and setupIOStats() [that checks for size()] method may be
error-prone. Could we simply discard the old cache object and create a new one
upon task re-scheduling. Doing this may help fix the next comment too. Though
we would need to take the writeLock whenever we edit the cache but that edit
(addition of new task) could be moved to TaskCompletedTransition.
Mix of synchronization with writeLock and synchronized block in constructStats?
If we discard the old object then we could get rid of this?
This is going to merge completed stats even when the enum set does not
container completed?
{code}+ //Internally merges only when completed stats has some data.
+ stats.mergeFrom(completedTasksStatsCache);
+ return stats;{code}
Not sure how this is more CPU efficient when the tasks of a vertex are all
running and the VM is invoking getStatistics() ? Do you have any intuition on
this or profiling results? If this is still an issue for running tasks then I
am fine with adding a new API that gets stats by task state.
> Consider scanning unfinished tasks in VertexImpl::constructStatistics to
> reduce merge overhead
> ----------------------------------------------------------------------------------------------
>
> Key: TEZ-2565
> URL: https://issues.apache.org/jira/browse/TEZ-2565
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Rajesh Balamohan
> Assignee: Rajesh Balamohan
> Attachments: TEZ-2565.1.patch, TEZ-2565.2.patch
>
>
> constructStatistics() can be an overhead (scanning all tasks and merging
> stats) depending on the number of invocations to Vertex::getStatistics().
> Consider scanning only unfinished tasks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)