spark git commit: [SPARK-4612] Reduce task latency and increase scheduling throughput by making configuration initialization lazy

rxin Tue, 25 Nov 2014 23:17:13 -0800

Repository: spark
Updated Branches:
  refs/heads/branch-1.2 69d021b0b -> e8669729a



[SPARK-4612] Reduce task latency and increase scheduling throughput by making 
configuration initialization lazy

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L337
 creates a configuration object for every task that is launched, even if there 
is no new dependent file/JAR to update. This is a heavy-weight creation that 
should be avoided if there is no new file/JAR to update. This PR makes that 
creation lazy. Quick local test in spark-perf scheduling throughput tests gives 
the following numbers in a local standalone scheduler mode.
1 job with 10000 tasks: before 7.8395 seconds, after 2.6415 seconds = 3x 
increase in task scheduling throughput

pwendell JoshRosen

Author: Tathagata Das <[email protected]>

Closes #3463 from tdas/lazy-config and squashes the following commits:

c791c1e [Tathagata Das] Reduce task latency by making configuration 
initialization lazy

(cherry picked from commit e7f4d2534bb3361ec4b7af0d42bc798a7a425226)
Signed-off-by: Reynold Xin <[email protected]>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e8669729
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e8669729
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e8669729

Branch: refs/heads/branch-1.2
Commit: e8669729af4b49423a7514830436b2cb4ee6a08a
Parents: 69d021b
Author: Tathagata Das <[email protected]>
Authored: Tue Nov 25 23:15:58 2014 -0800
Committer: Reynold Xin <[email protected]>
Committed: Tue Nov 25 23:16:14 2014 -0800

----------------------------------------------------------------------
 core/src/main/scala/org/apache/spark/executor/Executor.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/e8669729/core/src/main/scala/org/apache/spark/executor/Executor.scala
----------------------------------------------------------------------
diff --git a/core/src/main/scala/org/apache/spark/executor/Executor.scala 
b/core/src/main/scala/org/apache/spark/executor/Executor.scala
index 5fa5845..835157f 100644
--- a/core/src/main/scala/org/apache/spark/executor/Executor.scala
+++ b/core/src/main/scala/org/apache/spark/executor/Executor.scala
@@ -334,7 +334,7 @@ private[spark] class Executor(
    * SparkContext. Also adds any new JARs we fetched to the class loader.
    */
   private def updateDependencies(newFiles: HashMap[String, Long], newJars: 
HashMap[String, Long]) {
-    val hadoopConf = SparkHadoopUtil.get.newConfiguration(conf)
+    lazy val hadoopConf = SparkHadoopUtil.get.newConfiguration(conf)
     synchronized {
       // Fetch missing dependencies
       for ((name, timestamp) <- newFiles if currentFiles.getOrElse(name, -1L) 
< timestamp) {


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

spark git commit: [SPARK-4612] Reduce task latency and increase scheduling throughput by making configuration initialization lazy

Reply via email to