[
https://issues.apache.org/jira/browse/TEZ-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486621#comment-14486621
]
Rajesh Balamohan commented on TEZ-2234:
---------------------------------------
- might want to remove unwanted import in LogicalIOProcessorRuntimeTask
- For Inputs, SHUFFLE_BYTES_DECOMPRESSED is considered. For Outputs,
OUTPUT_BYTES is considered. OUTPUT_BYTES_WITH_OVERHEAD will be a closer match
for SHUFFLE_BYTES_DECOMPRESSED. But it depends on what is to being reported as
data sizes for output (Is it just the data without overhead? or Is it the
amount of data processed along with any serialization overhead, in which case
we might want to consider OUTPUT_BYTES_WITH_OVERHEAD). For pipelinedsorter,
OUTPUT_BYTES_WITH_OVERHEAD, OUTPUT_BYTES_PHYSICAL are not populated (tracked in
TEZ-2198).
- If tasks are in progress (with speculation on), TaskImpl.getStatistics()
chooses the best progressed attempt and gathers the stats. Stats might be
slightly misleading if the progress reverses later point in time. Hope that
should be fine as the diff might not be huge.
- Why should IOIndices be a map and not a set?. Will indices be used later?
- Can you plz share more details on the TODO in ShuffleUtils (or create a
separate JIRA)?
> Allow vertex managers to get output size per source vertex
> ----------------------------------------------------------
>
> Key: TEZ-2234
> URL: https://issues.apache.org/jira/browse/TEZ-2234
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Bikas Saha
> Assignee: Bikas Saha
> Attachments: TEZ-2234.1.patch, TEZ-2234.2.patch
>
>
> Vertex managers may need per source vertex output stats to make
> reconfiguration decisions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)