Younes created SPARK-14026: ------------------------------ Summary: Subquery not brodcasted Key: SPARK-14026 URL: https://issues.apache.org/jira/browse/SPARK-14026 Project: Spark Issue Type: Bug Components: Optimizer, SQL Affects Versions: 1.6.0 Reporter: Younes
Subquery doesn't get broadcasted and generate a very large shuffle. Select cnt, tab3.* from (Select count(1) cnt, col4 from tab1 join tab2 on col1=col2 group by col4) join tab3 on (col4=col3); This queries resultset is very small, doesn't get broadcasted and creates a huge shuffle: - Select count(1) cnt, col4 from tab1 join tab2 on col1=col2 group by col4 I tried the same query by persisting the subquery, and it worked just fine. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org