[ 
https://issues.apache.org/jira/browse/TEZ-646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-646:
-------------------------------

    Attachment: TEZ-646.1.txt

Patch introduces a CompositeDataMovementEvent which can be used if all Outputs 
from a task have the same payload.

2 gains - amount of data over RPC reduces, and the AM can keep a single copy of 
the payload instead of having multiple copies of the same payload. 

Reduces the size of TezEvents from ~220 bytes to 64 bytes. That's still a fair 
amount. Will deal with avoiding copies of TezEvents etc in a separate jira.

Unit tests pending, meanwhile [~bikassaha], [~hitesh] - could one of you please 
take a look.

> Avoid creating multiple copies of the same Event payload
> --------------------------------------------------------
>
>                 Key: TEZ-646
>                 URL: https://issues.apache.org/jira/browse/TEZ-646
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Siddharth Seth
>            Assignee: Siddharth Seth
>            Priority: Critical
>         Attachments: TEZ-646.1.txt
>
>
> OnFileSortedOutput generates the same event payload for all downstream tasks. 
> As an example, for a simple MR job - the number of copies of this is equal to 
> the number of reduce tasks.
> This needs to be done in a clean manner though - since the event model is 
> meant to generate a separate payload for each downstream task.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to