Big performance difference when joining 3 tables in different order

Hao Ren Thu, 04 Jun 2015 07:11:31 -0700

Hi,

I encountered a performance issue when join 3 tables in sparkSQL.


Here is the query:

SELECT g.period, c.categoryName, z.regionName, action, list_id, cnt
FROM t_category c, t_zipcode z, click_meter_site_grouped g
WHERE c.refCategoryID = g.category AND z.regionCode = g.region

I need to pay a lot of attention to the table order in FROM clause, if not, 
some order makes the driver broken, 
some order makes the job extremely slow,
only one order makes the job finished quickly.

For the slow one, I noticed a table is loaded 56 times !!! from its CSV
file.

I would like to know more about join implement in SparkSQL the understand
the issue (auto broadcast, etc).

For ones want to know more about the details, here is the jira:
https://issues.apache.org/jira/browse/SPARK-8102

Any help is welcome. =) Thx

Hao



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Big-performance-difference-when-joining-3-tables-in-different-order-tp23150.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Big performance difference when joining 3 tables in different order

Reply via email to