[
https://issues.apache.org/jira/browse/TEZ-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14592315#comment-14592315
]
Siddharth Seth commented on TEZ-2565:
-------------------------------------
Looking at this as part of TEZ-2496. Comments.
- Nit: "final Set<TezTaskID> taskSet;" - this could be a bitset
- "ioStats.putIfAbsent(name, new IOStatisticsImpl());" - Not a concurrentMap. A
put should be sufficient ? There's other issues if we expect concurrent
invocation of this method.
- {code}if (currentStatistics.containsTask(t.getTaskId())) { break;}{code}
Should be a continue so that it moves forward and aggregates statistics for
newly completed tasks.
- If we're only going to aggregate statistics when a task completes, this
should be called out on the API. In theory, it's possible to report stats as a
task makes progress. It would make sense to do this if progress was also
available. Since we're not doing that at the moment - calling it out on the API
javadocs will be useful.
- The change isn't thread safe if getStatistics can be called concurrently -
since it's modifying stats under a read lock. I don't think that's possible
right now since it's invoked from the VertexManager or when the Vertex
completes under a write lock. This would need it's own lock while constructing
stats if concurrent calls are possible - for example in the future if the Edge
is allowed to access stats, or stats are pushed to consumers. This should
ideally be documented.
> Consider scanning unfinished tasks in VertexImpl::constructStatistics to
> reduce merge overhead
> ----------------------------------------------------------------------------------------------
>
> Key: TEZ-2565
> URL: https://issues.apache.org/jira/browse/TEZ-2565
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Rajesh Balamohan
> Assignee: Rajesh Balamohan
> Attachments: TEZ-2565.1.patch
>
>
> constructStatistics() can be an overhead (scanning all tasks and merging
> stats) depending on the number of invocations to Vertex::getStatistics().
> Consider scanning only unfinished tasks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)