Hi All,

I am running a SQL query (spark version 1.2) on a table created from
unionAll of 3 schema RDDs which gets executed in roughly 400ms (200ms at
driver and roughly 200ms at executors).

If I run same query on a table created from unionAll of 27 schema RDDS, I
see that executors time is same(because of concurrency and nature of my
query) but driver time shoots to 600ms (and total query time being = 600 +
200 = 800ms).

I attached JProfiler and found that ClosureCleaner clean method is taking
time at driver(some issue related to URLClassLoader) and it linearly
increases with number of RDDs being union-ed on which query is getting
fired. This is causing my query to take a huge amount of time where I expect
the query to be executed within 400ms irrespective of number of RDDs (since
I have executors available to cater my need). PFB the links of screenshots
from Jprofiler :-

http://pasteboard.co/MnQtB4o.png

http://pasteboard.co/MnrzHwJ.png

Any help/suggestion to fix this will be highly appreciated since this needs
to be fixed for production

Thanks in Advance,
Nitin



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/ClosureCleaner-slowing-down-Spark-SQL-queries-tp12466.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to