[ 
https://issues.apache.org/jira/browse/TEZ-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated TEZ-3936:
---------------------------------
    Attachment: TEZ-3936.002.patch

> Reduce TezEvent messaging overhead
> ----------------------------------
>
>                 Key: TEZ-3936
>                 URL: https://issues.apache.org/jira/browse/TEZ-3936
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jonathan Eagles
>            Assignee: Jonathan Eagles
>            Priority: Major
>         Attachments: TEZ-3936.001.patch, TEZ-3936.002.patch
>
>
> Revisiting TEZ-3145, and found that in addition to improving the way empty 
> partitions are send from Maps to AM and AM to Reducers, message serialization 
> can be improved to reduce network traffic.
> For example in a job with 42000 Maps and 7500 reduces where 95% of the 
> partition data produced is empty. Tez DME events send from the AM to the 
> Reducers is num(Maps) * num(Reducers) * size (Wrapped DME). With 95% empty 
> partitions message size is 450 bytes where 260 bytes is needed for sending 
> empty partitions and 190 bytes for messaging. Total messaging is 132 GBs 
> 76 GBs for empty partition data and 56 GBs for non-empty partition messaging. 
> This jira aims to reduce the non-empty partition messaging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to