Hi All, I am running a SQL query (spark version 1.2) on a table created from unionAll of 3 schema RDDs which gets executed in roughly 400ms (200ms at driver and roughly 200ms at executors).
If I run same query on a table created from unionAll of 27 schema RDDS, I see that executors time is same(because of concurrency and nature of my query) but driver time shoots to 600ms (and total query time being = 600 + 200 = 800ms). I attached JProfiler and found that ClosureCleaner clean method is taking time at driver(some issue related to URLClassLoader) and it linearly increases with number of RDDs being union-ed on which query is getting fired. This is causing my query to take a huge amount of time where I expect the query to be executed within 400ms irrespective of number of RDDs (since I have executors available to cater my need). PFB the links of screenshots from Jprofiler :- http://pasteboard.co/MnQtB4o.png http://pasteboard.co/MnrzHwJ.png Any help/suggestion to fix this will be highly appreciated since this needs to be fixed for production Thanks in Advance, Nitin -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/ClosureCleaner-slowing-down-Spark-SQL-queries-tp12466.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org