[
https://issues.apache.org/jira/browse/TEZ-646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Siddharth Seth updated TEZ-646:
-------------------------------
Attachment: TEZ-646.1.txt
Patch introduces a CompositeDataMovementEvent which can be used if all Outputs
from a task have the same payload.
2 gains - amount of data over RPC reduces, and the AM can keep a single copy of
the payload instead of having multiple copies of the same payload.
Reduces the size of TezEvents from ~220 bytes to 64 bytes. That's still a fair
amount. Will deal with avoiding copies of TezEvents etc in a separate jira.
Unit tests pending, meanwhile [~bikassaha], [~hitesh] - could one of you please
take a look.
> Avoid creating multiple copies of the same Event payload
> --------------------------------------------------------
>
> Key: TEZ-646
> URL: https://issues.apache.org/jira/browse/TEZ-646
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Siddharth Seth
> Assignee: Siddharth Seth
> Priority: Critical
> Attachments: TEZ-646.1.txt
>
>
> OnFileSortedOutput generates the same event payload for all downstream tasks.
> As an example, for a simple MR job - the number of copies of this is equal to
> the number of reduce tasks.
> This needs to be done in a clean manner though - since the event model is
> meant to generate a separate payload for each downstream task.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)