Thanks for your suggestions.

file.count() takes 7s, so that doesn't seem to be the problem.
Moreover, a union with the same code/CSV takes about 15s (SELECT * FROM
rooms2 UNION SELECT * FROM rooms3).

The web status page shows that both stages 'count at joins.scala:216' and
'reduce at joins.scala:219' take up the majority of the time.
Is this due to bad partitioning or caching? Or is there a problem with the
JOIN operator?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Performance-problems-on-SQL-JOIN-tp8001p8016.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to