[
https://issues.apache.org/jira/browse/SPARK-14091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apache Spark reassigned SPARK-14091:
------------------------------------
Assignee: Apache Spark
> Consider improving performance of SparkContext.getCallSite()
> ------------------------------------------------------------
>
> Key: SPARK-14091
> URL: https://issues.apache.org/jira/browse/SPARK-14091
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Reporter: Rajesh Balamohan
> Assignee: Apache Spark
>
> Currently SparkContext.getCallSite() makes a call to Utils.getCallSite().
> {noformat}
> private[spark] def getCallSite(): CallSite = {
> val callSite = Utils.getCallSite()
> CallSite(
>
> Option(getLocalProperty(CallSite.SHORT_FORM)).getOrElse(callSite.shortForm),
>
> Option(getLocalProperty(CallSite.LONG_FORM)).getOrElse(callSite.longForm)
> )
> }
> {noformat}
> However, in some places utils.withDummyCallSite(sc) is invoked to avoid
> expensive threaddumps within getCallSite(). But Utils.getCallSite() is
> evaluated earlier causing threaddumps to be computed. This would impact when
> lots of RDDs are created (e.g spends close to 3-7 seconds when 1000+ are RDDs
> are present, which can have significant impact when entire query runtime is
> in the order of 10-20 seconds)
> Creating this jira to consider evaluating getCallSite only when needed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]