[
https://issues.apache.org/jira/browse/SPARK-14026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon updated SPARK-14026:
---------------------------------
Labels: bulk-closed (was: )
> Subquery not brodcasted
> -----------------------
>
> Key: SPARK-14026
> URL: https://issues.apache.org/jira/browse/SPARK-14026
> Project: Spark
> Issue Type: Bug
> Components: Optimizer, SQL
> Affects Versions: 1.6.0
> Reporter: Younes
> Priority: Major
> Labels: bulk-closed
>
> Subquery doesn't get broadcasted and generate a very large shuffle.
> Select cnt, tab3.*
> from (Select count(1) cnt, col4 from tab1 join tab2 on col1=col2 group by
> col4)
> join tab3 on (col4=col3);
> This queries resultset is very small, doesn't get broadcasted and creates a
> huge shuffle:
> - Select count(1) cnt, col4 from tab1 join tab2 on col1=col2 group by col4
> I tried the same query by persisting the subquery, and it worked just fine.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]