[ 
https://issues.apache.org/jira/browse/TEZ-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14011543#comment-14011543
 ] 

Bikas Saha commented on TEZ-1157:
---------------------------------

To be clear, this is related to but not entirely a broadcast problem. Broadcast 
by itself means that the output will be accessible to all consumers. It does 
not mean that the output will be read in its entirety by all consumers. In some 
(maybe most) cases, the output will be read in its entirety. This is the 
behavior of unsorted-kv-input when used in a broadcast edge. It reads all the 
data from the producer which is bad in when its on the broadcast edge for 
multiple consumers that run concurrently.

> Optimize broadcast :- Tasks pertaining to same job in same machine should not 
> download multiple copies of broadcast data
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: TEZ-1157
>                 URL: https://issues.apache.org/jira/browse/TEZ-1157
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>              Labels: performance
>         Attachments: TEZ-1152.WIP.patch
>
>
> Currently tasks (belonging to same job) running in the same machine download 
> its own copy of broadcast data.  Optimization could be to  download one copy 
> in the machine, and the rest of the tasks can refer to this downloaded copy.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to