[
https://issues.apache.org/jira/browse/TEZ-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138507#comment-14138507
]
Gopal V commented on TEZ-1157:
------------------------------
Committed to trunk - thanks [~sseth], [~rajesh.balamohan] for the reviews.
{{code}}
commit 625450cf11454fa9697a902ba70367de00cdc170
Author: Gopal V <[email protected]>
Date: Wed Sep 17 20:53:11 2014 -0700
TEZ-1157. Optimize broadcast shuffle to download data only once per host.
(gopalv)
{{code}}
> Optimize broadcast :- Tasks pertaining to same job in same machine should not
> download multiple copies of broadcast data
> ------------------------------------------------------------------------------------------------------------------------
>
> Key: TEZ-1157
> URL: https://issues.apache.org/jira/browse/TEZ-1157
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Rajesh Balamohan
> Assignee: Gopal V
> Labels: performance
> Attachments: TEZ-1152.WIP.patch, TEZ-1157.10.patch,
> TEZ-1157.3.WIP.patch, TEZ-1157.4.WIP.patch, TEZ-1157.5.WIP.patch,
> TEZ-1157.6.patch, TEZ-1157.7.patch, TEZ-1157.8.patch, TEZ-1157.9.patch,
> TEZ-broadcast-shuffle+vertex-parallelism.patch, connections.png, latency.png
>
>
> Currently tasks (belonging to same job) running in the same machine download
> its own copy of broadcast data. Optimization could be to download one copy
> in the machine, and the rest of the tasks can refer to this downloaded copy.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)