Repository: spark
Updated Branches:
  refs/heads/branch-1.1 ec0b91edd -> 55e9dd637


[SPARK-3084] [SQL] Collect broadcasted tables in parallel in joins

BroadcastHashJoin has a broadcastFuture variable that tries to collect
the broadcasted table in a separate thread, but this doesn't help
because it's a lazy val that only gets initialized when you attempt to
build the RDD. Thus queries that broadcast multiple tables would collect
and broadcast them sequentially. I changed this to a val to let it start
collecting right when the operator is created.

Author: Matei Zaharia <[email protected]>

Closes #1990 from mateiz/spark-3084 and squashes the following commits:

f468766 [Matei Zaharia] [SPARK-3084] Collect broadcasted tables in parallel in 
joins

(cherry picked from commit 6a13dca12fac06f3af892ffcc8922cc84f91b786)
Signed-off-by: Michael Armbrust <[email protected]>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/55e9dd63
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/55e9dd63
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/55e9dd63

Branch: refs/heads/branch-1.1
Commit: 55e9dd637bdef3a2acf56af95410219e23c9502a
Parents: ec0b91e
Author: Matei Zaharia <[email protected]>
Authored: Mon Aug 18 10:05:52 2014 -0700
Committer: Michael Armbrust <[email protected]>
Committed: Mon Aug 18 10:06:07 2014 -0700

----------------------------------------------------------------------
 sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/55e9dd63/sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala
----------------------------------------------------------------------
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala
index c86811e..481bb8c 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala
@@ -424,7 +424,7 @@ case class BroadcastHashJoin(
     UnspecifiedDistribution :: UnspecifiedDistribution :: Nil
 
   @transient
-  lazy val broadcastFuture = future {
+  val broadcastFuture = future {
     sparkContext.broadcast(buildPlan.executeCollect())
   }
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to