[
https://issues.apache.org/jira/browse/TEZ-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14488269#comment-14488269
]
Bikas Saha commented on TEZ-2234:
---------------------------------
bq. might want to remove unwanted import in LogicalIOProcessorRuntimeTask
Removed
bq. SHUFFLE_BYTES_DECOMPRESSED is considered. For Outputs, OUTPUT_BYTES is
considered.
We want to give logical data written/read (at least for the current API's).
Thats why OUTPUT_BYTES has been used and SHUFFLE_BYTES_DECOMPRESSED has been
used. However, from a local run, there values don't match up - Wondering why?
Do you have any clues?
bq. progress (with speculation on), TaskImpl.getStatistics() chooses the best
progressed attempt
Yes. the documentation says that these are point in time values and can change
with a refresh.
bq. Why should IOIndices be a map and not a set?. Will indices be used later?
Its currently not used but a map is put in place to add more info later on if
needed. The current integer value can be used to create an array of statistics
values (per logical edge) instead of the using a map (in TaskStatistics
object). However, memory overhead is small even with a map - so the array based
impl with these indices was not needed.
bq. Can you plz share more details on the TODO in ShuffleUtils (or create a
separate JIRA)?
The todo is orthogonal to the patch. I was not sure if finalMergeEnabled would
cause multiple VM events to be sent out (one per spill) before isLastEvent
becomes true. If that is the case, then I will open a new jira to track that.
What do you think? The solution would be to move the VM event sending code to
the close() method.
{code} if (finalMergeEnabled || isLastEvent) {
ShuffleUserPayloads.VertexManagerEventPayloadProto.Builder vmBuilder =
ShuffleUserPayloads.VertexManagerEventPayloadProto.newBuilder();
long outputSize =
context.getCounters().findCounter(TaskCounter.OUTPUT_BYTES).getValue();
//Set this information only when required. In pipelined shuffle,
multiple events would end
// up adding up to final outputsize. This is needed for auto-reduce
parallelism to work
// properly.
vmBuilder.setOutputSize(outputSize);
VertexManagerEvent vmEvent = VertexManagerEvent.create(
context.getDestinationVertexName(),
vmBuilder.build().toByteString().asReadOnlyByteBuffer());
eventList.add(vmEvent);
}{code}
> Allow vertex managers to get output size per source vertex
> ----------------------------------------------------------
>
> Key: TEZ-2234
> URL: https://issues.apache.org/jira/browse/TEZ-2234
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Bikas Saha
> Assignee: Bikas Saha
> Attachments: TEZ-2234.1.patch, TEZ-2234.2.patch
>
>
> Vertex managers may need per source vertex output stats to make
> reconfiguration decisions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)