Repository: spark Updated Branches: refs/heads/branch-1.1 ec0b91edd -> 55e9dd637
[SPARK-3084] [SQL] Collect broadcasted tables in parallel in joins BroadcastHashJoin has a broadcastFuture variable that tries to collect the broadcasted table in a separate thread, but this doesn't help because it's a lazy val that only gets initialized when you attempt to build the RDD. Thus queries that broadcast multiple tables would collect and broadcast them sequentially. I changed this to a val to let it start collecting right when the operator is created. Author: Matei Zaharia <[email protected]> Closes #1990 from mateiz/spark-3084 and squashes the following commits: f468766 [Matei Zaharia] [SPARK-3084] Collect broadcasted tables in parallel in joins (cherry picked from commit 6a13dca12fac06f3af892ffcc8922cc84f91b786) Signed-off-by: Michael Armbrust <[email protected]> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/55e9dd63 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/55e9dd63 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/55e9dd63 Branch: refs/heads/branch-1.1 Commit: 55e9dd637bdef3a2acf56af95410219e23c9502a Parents: ec0b91e Author: Matei Zaharia <[email protected]> Authored: Mon Aug 18 10:05:52 2014 -0700 Committer: Michael Armbrust <[email protected]> Committed: Mon Aug 18 10:06:07 2014 -0700 ---------------------------------------------------------------------- sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/55e9dd63/sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala ---------------------------------------------------------------------- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala index c86811e..481bb8c 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala @@ -424,7 +424,7 @@ case class BroadcastHashJoin( UnspecifiedDistribution :: UnspecifiedDistribution :: Nil @transient - lazy val broadcastFuture = future { + val broadcastFuture = future { sparkContext.broadcast(buildPlan.executeCollect()) } --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
