[ 
https://issues.apache.org/jira/browse/TEZ-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486621#comment-14486621
 ] 

Rajesh Balamohan commented on TEZ-2234:
---------------------------------------

- might want to remove unwanted import in LogicalIOProcessorRuntimeTask
- For Inputs, SHUFFLE_BYTES_DECOMPRESSED is considered.  For Outputs, 
OUTPUT_BYTES is considered.  OUTPUT_BYTES_WITH_OVERHEAD will be a closer match 
for SHUFFLE_BYTES_DECOMPRESSED. But it depends on what is to being reported as 
data sizes for output (Is it just the data without overhead? or Is it the 
amount of data processed along with any serialization overhead, in which case 
we might want to consider OUTPUT_BYTES_WITH_OVERHEAD).  For pipelinedsorter, 
OUTPUT_BYTES_WITH_OVERHEAD, OUTPUT_BYTES_PHYSICAL are not populated (tracked in 
TEZ-2198).
- If tasks are in progress (with speculation on), TaskImpl.getStatistics() 
chooses the best progressed attempt and gathers the stats.  Stats might be 
slightly misleading if the progress reverses later point in time. Hope that 
should be fine as the diff might not be huge.
- Why should IOIndices be a map and not a set?. Will indices be used later?
- Can you plz share more details on the TODO in ShuffleUtils (or create a 
separate JIRA)?

> Allow vertex managers to get output size per source vertex
> ----------------------------------------------------------
>
>                 Key: TEZ-2234
>                 URL: https://issues.apache.org/jira/browse/TEZ-2234
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Bikas Saha
>            Assignee: Bikas Saha
>         Attachments: TEZ-2234.1.patch, TEZ-2234.2.patch
>
>
> Vertex managers may need per source vertex output stats to make 
> reconfiguration decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to