Repository: spark
Updated Branches:
  refs/heads/master b554b3c46 -> ff7cc45f5


[SPARK-14091][CORE] Improve performance of SparkContext.getCallSite()

Currently SparkContext.getCallSite() makes a call to Utils.getCallSite().

```
 private[spark] def getCallSite(): CallSite = {
    val callSite = Utils.getCallSite()
    CallSite(
      
Option(getLocalProperty(CallSite.SHORT_FORM)).getOrElse(callSite.shortForm),
      Option(getLocalProperty(CallSite.LONG_FORM)).getOrElse(callSite.longForm)
    )
  }
```
However, in some places utils.withDummyCallSite(sc) is invoked to avoid 
expensive threaddumps within getCallSite(). But Utils.getCallSite() is 
evaluated earlier causing threaddumps to be computed.

This can have severe impact on smaller queries (that finish in 10-20 seconds) 
having large number of RDDs.

Creating this patch for lazy evaluation of  getCallSite.

No new test cases are added. Following standalone test was tried out manually. 
Also, built entire spark binary and tried with few SQL queries in TPC-DS  and 
TPC-H in multi node cluster
```
def run(): Unit = {
    val conf = new SparkConf()
    val sc = new SparkContext("local[1]", "test-context", conf)
    val start: Long = System.currentTimeMillis();
    val confBroadcast = sc.broadcast(new SerializableConfiguration(new 
Configuration()))
    Utils.withDummyCallSite(sc) {
      //Large tables end up creating 5500 RDDs
      for(i <- 1 to 5000) {
       //ignore nulls in RDD as its mainly for testing callSite
        val testRDD = new HadoopRDD(sc, confBroadcast, None, null,
          classOf[NullWritable], classOf[Writable], 10)
      }
    }
    val end: Long = System.currentTimeMillis();
    println("Time taken : " + (end - start))
  }

def main(args: Array[String]): Unit = {
    run
  }
```

Author: Rajesh Balamohan <[email protected]>

Closes #11911 from rajeshbalamohan/SPARK-14091.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ff7cc45f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ff7cc45f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ff7cc45f

Branch: refs/heads/master
Commit: ff7cc45f521c63ce40f955b8995d52a79dca17b4
Parents: b554b3c
Author: Rajesh Balamohan <[email protected]>
Authored: Fri Mar 25 15:09:42 2016 -0700
Committer: Josh Rosen <[email protected]>
Committed: Fri Mar 25 15:09:52 2016 -0700

----------------------------------------------------------------------
 core/src/main/scala/org/apache/spark/SparkContext.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/ff7cc45f/core/src/main/scala/org/apache/spark/SparkContext.scala
----------------------------------------------------------------------
diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala 
b/core/src/main/scala/org/apache/spark/SparkContext.scala
index d2cf3bf..dcb41f3 100644
--- a/core/src/main/scala/org/apache/spark/SparkContext.scala
+++ b/core/src/main/scala/org/apache/spark/SparkContext.scala
@@ -1737,7 +1737,7 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
    * has overridden the call site using `setCallSite()`, this will return the 
user's version.
    */
   private[spark] def getCallSite(): CallSite = {
-    val callSite = Utils.getCallSite()
+    lazy val callSite = Utils.getCallSite()
     CallSite(
       
Option(getLocalProperty(CallSite.SHORT_FORM)).getOrElse(callSite.shortForm),
       Option(getLocalProperty(CallSite.LONG_FORM)).getOrElse(callSite.longForm)


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to