[ 
https://issues.apache.org/jira/browse/TEZ-2265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393710#comment-14393710
 ] 

Bikas Saha commented on TEZ-2265:
---------------------------------

That only handles TaskCounters - not all kinds of counters. e.g. filesystem or 
user generated counters. This would actually provide a cleaner way to separate 
per IO counters and then merge them going upwards vs playing tricks with the 
counter names and also handle all counters vs only TaskCounters. So, progress 
on the per IO counters should probably consider building on this and 
propagating and merging the counters appropriately. However, both approaches 
suffer from high memory overhead - the main barrier to enabling them to be on 
by default.

Thanks for the review. Will commit in a bit. All the way to 0.5.

> All inputs/outputs in a task share the same counter object
> ----------------------------------------------------------
>
>                 Key: TEZ-2265
>                 URL: https://issues.apache.org/jira/browse/TEZ-2265
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Bikas Saha
>            Assignee: Bikas Saha
>         Attachments: TEZ-2265.1.patch
>
>
> This makes one input which reads the value of a counter to see updates that 
> were made by some other input/output. Any decisions based on those counter 
> values could be wrong and as of today, the output data size reported  in 
> vertex manager event or by shuffle pipelining is potentially wrong.
> Simple fix would be to have each IO have their own counters and merge them 
> before reporting back up in the heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to