[ 
https://issues.apache.org/jira/browse/TEZ-1152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14011041#comment-14011041
 ] 

Rajesh Balamohan commented on TEZ-1152:
---------------------------------------

Since #1 and #2 need to be fixed independently, I will create sub tasks.

> Optimize broadcast join for scalability
> ---------------------------------------
>
>                 Key: TEZ-1152
>                 URL: https://issues.apache.org/jira/browse/TEZ-1152
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Rajesh Balamohan
>              Labels: performance, scalability
>
> Two main issues for large queries using broadcast shuffle
> 1. Lots of tasks communicate to same node for downloading shuffle data. So 
> most of the time, single machine will be overloaded with requests.
> 2. Tasks pertaining to same job (in the same machine) downloads broadcast 
> shuffle data redundantly.  If the data can be copied to temp storage or 
> ramfs, other tasks running in the same machine can refer to the local copy.  
> Optimizing this would help when running multiple queries in parallel in the 
> cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to