[jira] [Commented] (TEZ-3222) Reduce messaging overhead for auto-reduce parallelism case

Bikas Saha (JIRA) Wed, 19 Oct 2016 22:20:06 -0700

    [ 
https://issues.apache.org/jira/browse/TEZ-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15590836#comment-15590836
 ]


Bikas Saha commented on TEZ-3222:
---------------------------------

{code} -    return commonRouteMeta[sourceTaskIndex];
+    return CompositeEventRouteMetadata.create(1, sourceTaskIndex, 0);
{code}
The removed code is looking up an array indexed by sourceTaskIndex while the 
new code is directly using the sourceTaskIndex. Is there a difference? Also, 
reusing the caching (as done earlier) may improve critical path CPU for object 
creation. Though for broadcast edge I am not sure if CDME is used as of now.

{code}
+message CompositeRoutedDataMovementEventProto {
+  optional int32 source_index = 1;
+  optional int32 target_index = 2;
+  optional int32 count = 3;
+  optional bytes user_payload = 4;
+  optional int32 version = 5;
+}{code}
Can we create a message for CompositeRouteMeta and use it vs expanding its 
contents. That way CompositeRouteMeta could evolve independently.

{code}
     if (event instanceof DataMovementEvent) {
       numDmeEvents.incrementAndGet();
-      processDataMovementEvent((DataMovementEvent)event);
+      DataMovementEvent dmEvent = (DataMovementEvent)event;
+      DataMovementEventPayloadProto shufflePayload;
+      try {
+        shufflePayload = 
DataMovementEventPayloadProto.parseFrom(ByteString.copyFrom(dmEvent.getUserPayload()));
+      } catch (InvalidProtocolBufferException e) {
+        throw new TezUncheckedException("Unable to parse DataMovementEvent 
payload", e);
+      }
+      BitSet emptyPartitionsBitSet = null;
+      if (shufflePayload.hasEmptyPartitions()) {
+        try {
+          byte[] emptyPartitions = 
TezCommonUtils.decompressByteStringToByteArray(shufflePayload.getEmptyPartitions(),
 inflater);
{code} 
I dont think DME's dont have empty partition bitset since they dont have 
multi-partition data. Right? [~rajesh.balamohan] 

Rest looks good to me.
+1
Thanks!



> Reduce messaging overhead for auto-reduce parallelism case
> ----------------------------------------------------------
>
>                 Key: TEZ-3222
>                 URL: https://issues.apache.org/jira/browse/TEZ-3222
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jonathan Eagles
>            Assignee: Jonathan Eagles
>         Attachments: TEZ-3222.1.patch, TEZ-3222.2.patch, TEZ-3222.3.patch, 
> TEZ-3222.4.patch, TEZ-3222.5.patch, TEZ-3222.6.patch
>
>
> A dag with 15k x 1000k vertex may auto-reduce to 15k x 1. And while the data  
> size is appropriate for 1 task attempt, this results in an increase in task 
> attempt message processing of 1000x.
> This jira aims to reduce the message processing in the auto-reduced task 
> while keeping the amount of message processing in the AM the same or less.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3222) Reduce messaging overhead for auto-reduce parallelism case

Reply via email to