[ 
https://issues.apache.org/jira/browse/TEZ-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116314#comment-14116314
 ] 

Gopal V commented on TEZ-1157:
------------------------------

Tested with the WIP patch

{code}
$ yarn logs -applicationId application_1408148409003_0490 | grep HttpConn | wc 
-l
2000

$ yarn logs -applicationId application_1408148409003_0491 | grep HttpConn | wc 
-l
19
{code}

The speeds are about the same, half a second worse with the patch enabled - 
possibly due to running tests in a 10GbE single rack cluster.

> Optimize broadcast :- Tasks pertaining to same job in same machine should not 
> download multiple copies of broadcast data
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: TEZ-1157
>                 URL: https://issues.apache.org/jira/browse/TEZ-1157
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Rajesh Balamohan
>            Assignee: Gopal V
>              Labels: performance
>         Attachments: TEZ-1152.WIP.patch, TEZ-1157.3.WIP.patch, 
> TEZ-1157.4.WIP.patch, TEZ-1157.5.WIP.patch, 
> TEZ-broadcast-shuffle+vertex-parallelism.patch
>
>
> Currently tasks (belonging to same job) running in the same machine download 
> its own copy of broadcast data.  Optimization could be to  download one copy 
> in the machine, and the rest of the tasks can refer to this downloaded copy.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to