[ 
https://issues.apache.org/jira/browse/TEZ-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598147#comment-14598147
 ] 

Bikas Saha commented on TEZ-2565:
---------------------------------

bq.  won't there be a condition where "VM asks for statistics and in the middle 
of executing it. And due to some failure, the stats cache gets recreated as a 
part of TaskRescheduledTransition."
The readlock() in getStatistics() and the writeLock() in 
TaskRescheduledTransition should prevent race conditions, right?
Also, the cache can be updated during TaskCompletedTransition to amortize some 
cost + simplify the code for on-the-fly construction. That would only need to 
handle running tasks (if needed).

bq. Without partition stats, currently it has only 2 setters in IOStatistics 
which wouldn't show up as high CPU in existing code.
Sorry. Not clear on this. Right now, partition stats are present in the patch. 
CPU is low. But once partition stats are added then wouldn't CPU consumption be 
expensive (like you have already observed) when running tasks >> completed 
tasks?

> Consider scanning unfinished tasks in VertexImpl::constructStatistics to 
> reduce merge overhead
> ----------------------------------------------------------------------------------------------
>
>                 Key: TEZ-2565
>                 URL: https://issues.apache.org/jira/browse/TEZ-2565
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>         Attachments: TEZ-2565.1.patch, TEZ-2565.2.patch
>
>
> constructStatistics() can be an overhead (scanning all tasks and merging 
> stats) depending on the number of invocations to Vertex::getStatistics().  
> Consider scanning only unfinished tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to