[ 
https://issues.apache.org/jira/browse/TEZ-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14592315#comment-14592315
 ] 

Siddharth Seth commented on TEZ-2565:
-------------------------------------

Looking at this as part of TEZ-2496. Comments.
- Nit: "final Set<TezTaskID> taskSet;" - this could be a bitset
- "ioStats.putIfAbsent(name, new IOStatisticsImpl());" - Not a concurrentMap. A 
put should be sufficient ? There's other issues if we expect concurrent 
invocation of this method.
- {code}if (currentStatistics.containsTask(t.getTaskId())) { break;}{code} 
Should be a continue so that it moves forward and aggregates statistics for 
newly completed tasks.
- If we're only going to aggregate statistics when a task completes, this 
should be called out on the API. In theory, it's possible to report stats as a 
task makes progress. It would make sense to do this if progress was also 
available. Since we're not doing that at the moment - calling it out on the API 
javadocs will be useful.

- The change isn't thread safe if getStatistics can be called concurrently - 
since it's modifying stats under a read lock. I don't think that's possible 
right now since it's invoked from the VertexManager or when the Vertex 
completes under a write lock. This would need it's own lock while constructing 
stats if concurrent calls are possible - for example in the future if the Edge 
is allowed to access stats, or stats are pushed to consumers. This should 
ideally be documented.

> Consider scanning unfinished tasks in VertexImpl::constructStatistics to 
> reduce merge overhead
> ----------------------------------------------------------------------------------------------
>
>                 Key: TEZ-2565
>                 URL: https://issues.apache.org/jira/browse/TEZ-2565
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>         Attachments: TEZ-2565.1.patch
>
>
> constructStatistics() can be an overhead (scanning all tasks and merging 
> stats) depending on the number of invocations to Vertex::getStatistics().  
> Consider scanning only unfinished tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to