[
https://issues.apache.org/jira/browse/TEZ-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jonathan Eagles updated TEZ-3936:
---------------------------------
Attachment: TEZ-3936.001.patch
> Reduce TezEvent messaging overhead
> ----------------------------------
>
> Key: TEZ-3936
> URL: https://issues.apache.org/jira/browse/TEZ-3936
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Jonathan Eagles
> Assignee: Jonathan Eagles
> Priority: Major
> Attachments: TEZ-3936.001.patch
>
>
> Revisiting TEZ-3145, and found that in addition to improving the way empty
> partitions are send from Maps to AM and AM to Reducers, message serialization
> can be improved to reduce network traffic.
> For example in a job with 42000 Maps and 7500 reduces where 95% of the
> partition data produced is empty. Tez DME events send from the AM to the
> Reducers is num(Maps) * num(Reducers) * size (Wrapped DME). With 95% empty
> partitions message size is 450 bytes where 260 bytes is needed for sending
> empty partitions and 190 bytes for messaging. Total messaging is 132 GBs
> 76 GBs for empty partition data and 56 GBs for non-empty partition messaging.
> This jira aims to reduce the non-empty partition messaging.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)