[jira] [Commented] (TEZ-2234) Allow vertex managers to get output size per source vertex

Bikas Saha (JIRA) Thu, 09 Apr 2015 14:01:20 -0700

    [ 
https://issues.apache.org/jira/browse/TEZ-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14488269#comment-14488269
 ]


Bikas Saha commented on TEZ-2234:
---------------------------------

bq. might want to remove unwanted import in LogicalIOProcessorRuntimeTask
Removed

bq.  SHUFFLE_BYTES_DECOMPRESSED is considered. For Outputs, OUTPUT_BYTES is 
considered. 
We want to give logical data written/read (at least for the current API's). 
Thats why OUTPUT_BYTES has been used and SHUFFLE_BYTES_DECOMPRESSED has been 
used. However, from a local run, there values don't match up - Wondering why? 
Do you have any clues?

bq. progress (with speculation on), TaskImpl.getStatistics() chooses the best 
progressed attempt 
Yes. the documentation says that these are point in time values and can change 
with a refresh.

bq. Why should IOIndices be a map and not a set?. Will indices be used later?
Its currently not used but a map is put in place to add more info later on if 
needed. The current integer value can be used to create an array of statistics 
values (per logical edge) instead of the using a map (in TaskStatistics 
object). However, memory overhead is small even with a map - so the array based 
impl with these indices was not needed.

bq. Can you plz share more details on the TODO in ShuffleUtils (or create a 
separate JIRA)?
The todo is orthogonal to the patch. I was not sure if finalMergeEnabled would 
cause multiple VM events to be sent out (one per spill) before isLastEvent 
becomes true. If that is the case, then I will open a new jira to track that. 
What do you think? The solution would be to move the VM event sending code to 
the close() method.
{code}    if (finalMergeEnabled || isLastEvent) {
      ShuffleUserPayloads.VertexManagerEventPayloadProto.Builder vmBuilder =
          ShuffleUserPayloads.VertexManagerEventPayloadProto.newBuilder();

      long outputSize = 
context.getCounters().findCounter(TaskCounter.OUTPUT_BYTES).getValue();

      //Set this information only when required.  In pipelined shuffle, 
multiple events would end
      // up adding up to final outputsize.  This is needed for auto-reduce 
parallelism to work
      // properly.
      vmBuilder.setOutputSize(outputSize);
      VertexManagerEvent vmEvent = VertexManagerEvent.create(
          context.getDestinationVertexName(), 
vmBuilder.build().toByteString().asReadOnlyByteBuffer());
      eventList.add(vmEvent);
    }{code}



> Allow vertex managers to get output size per source vertex
> ----------------------------------------------------------
>
>                 Key: TEZ-2234
>                 URL: https://issues.apache.org/jira/browse/TEZ-2234
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Bikas Saha
>            Assignee: Bikas Saha
>         Attachments: TEZ-2234.1.patch, TEZ-2234.2.patch
>
>
> Vertex managers may need per source vertex output stats to make 
> reconfiguration decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2234) Allow vertex managers to get output size per source vertex

Reply via email to