[jira] [Updated] (SPARK-7970) Optimize code for SQL queries fired on Union of RDDs (closure cleaner)

Nitin Goyal (JIRA) Sat, 30 May 2015 11:29:43 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Nitin Goyal updated SPARK-7970:
-------------------------------
    Attachment: Screen Shot 2015-05-27 at 11.01.03 pm.png

> Optimize code for SQL queries fired on Union of RDDs (closure cleaner)
> ----------------------------------------------------------------------
>
>                 Key: SPARK-7970
>                 URL: https://issues.apache.org/jira/browse/SPARK-7970
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core, SQL
>    Affects Versions: 1.2.0, 1.3.0
>            Reporter: Nitin Goyal
>         Attachments: Screen Shot 2015-05-27 at 11.01.03 pm.png, Screen Shot 
> 2015-05-27 at 11.07.02 pm.png
>
>
> Closure cleaner slows down the execution of Spark SQL queries fired on union 
> of RDDs. The time increases linearly at driver side with number of RDDs 
> unioned. Refer following thread for more context :-
> http://apache-spark-developers-list.1001551.n3.nabble.com/ClosureCleaner-slowing-down-Spark-SQL-queries-tt12466.html
> As can be seen in attached screenshots of Jprofiler, lot of time is getting 
> consumed in "getClassReader" method of ClosureCleaner and rest in 
> "ensureSerializable" (atleast in my case)
> This can be fixed in two ways (as per my current understanding) :-
> 1. Fixed at Spark SQL level - As pointed out by yhuai, we can create 
> MapPartitionsRDD idirectly nstead of doing rdd.mapPartitions which calls 
> ClosureCleaner clean method.
> 2. Fix at Spark core level -
>   (i) Make "checkSerializable" property driven in SparkContext's clean method
>   (ii) Somehow cache classreader for last 'n' classes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-7970) Optimize code for SQL queries fired on Union of RDDs (closure cleaner)

Reply via email to