[ 
https://issues.apache.org/jira/browse/TEZ-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14490282#comment-14490282
 ] 

Bikas Saha commented on TEZ-2234:
---------------------------------

Will add annotations.
getDataSize() is the logical data size as written by the user. The closest 
thing to that is OUTPUT_BYTES. The difference between them for many jobs is 
large enough that perhaps we should look at reducing the overhead.
Yes, plugins are not getting task level info for now. Not needed for PIG-4434. 
The docs specify that the values are point in time and may change with 
progress/failures/refreshes.
This cannot get rid of VM events as there is no way to correlate between tasks 
and output size and so the extrapolation of current output size to final output 
size based on current completed tasks to total tasks does not work. So the VM 
events are still needed until (if ever) we start exposing task level sizes.

Thanks for the reviews!

> Allow vertex managers to get output size per source vertex
> ----------------------------------------------------------
>
>                 Key: TEZ-2234
>                 URL: https://issues.apache.org/jira/browse/TEZ-2234
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Bikas Saha
>            Assignee: Bikas Saha
>         Attachments: TEZ-2234.1.patch, TEZ-2234.2.patch, TEZ-2234.3.patch
>
>
> Vertex managers may need per source vertex output stats to make 
> reconfiguration decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to