[
https://issues.apache.org/jira/browse/TEZ-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14073087#comment-14073087
]
Gopal V commented on TEZ-1157:
------------------------------
[~bikassaha]: This is actually WIP from a couple of weeks ago.
The branch has moved on, but I pulled the git rev that builds, so that this
reply to the thread on user@tez gets some context.
http://mail-archives.apache.org/mod_mbox/tez-user/201407.mbox/%[email protected]%3E
FYI, 32k is related to the ephemeral port count on the target machine & the 1Gb
number is the max split size.
> Optimize broadcast :- Tasks pertaining to same job in same machine should not
> download multiple copies of broadcast data
> ------------------------------------------------------------------------------------------------------------------------
>
> Key: TEZ-1157
> URL: https://issues.apache.org/jira/browse/TEZ-1157
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Rajesh Balamohan
> Assignee: Rajesh Balamohan
> Labels: performance
> Attachments: TEZ-1152.WIP.patch,
> TEZ-broadcast-shuffle+vertex-parallelism.patch
>
>
> Currently tasks (belonging to same job) running in the same machine download
> its own copy of broadcast data. Optimization could be to download one copy
> in the machine, and the rest of the tasks can refer to this downloaded copy.
--
This message was sent by Atlassian JIRA
(v6.2#6252)