[ 
https://issues.apache.org/jira/browse/TEZ-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598876#comment-14598876
 ] 

Siddharth Seth commented on TEZ-2565:
-------------------------------------

We don't use / report statistics for running tasks yet. Given that, I'd say we 
should only report statistics for completed tasks in the current APIs. That 
gets rid of processing overheads, as well as the huge memory overhead that'll 
come in with TEZ-2496.
Once we have a use case for running stats, they can be introduced with a little 
additional though like handling progress. They're not likely to be useful 
without knowing how much progress tasks / vertices have made.

That along with aggregating stats on task completion, and not invalidating them 
on failure (see review comments on TEZ-2496) should reduce the overhead 
significantly as well as simplify the patch.
It can be done as a follow up - to get this and 2496 in without multiple rounds 
of review.

> Consider scanning unfinished tasks in VertexImpl::constructStatistics to 
> reduce merge overhead
> ----------------------------------------------------------------------------------------------
>
>                 Key: TEZ-2565
>                 URL: https://issues.apache.org/jira/browse/TEZ-2565
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>         Attachments: TEZ-2565.1.patch, TEZ-2565.2.patch
>
>
> constructStatistics() can be an overhead (scanning all tasks and merging 
> stats) depending on the number of invocations to Vertex::getStatistics().  
> Consider scanning only unfinished tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to