Zhenhua Wang created SPARK-19915: ------------------------------------ Summary: Improve join reorder: simplify cost evaluation, postpone column pruning, exclude cartesian product Key: SPARK-19915 URL: https://issues.apache.org/jira/browse/SPARK-19915 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 2.2.0 Reporter: Zhenhua Wang
Do column pruning during reordering is troublesome. We can do it right after reordering, then logics for adding projects on intermediate joins can be removed. This makes the code simpler and more reliable. Usually cardinality is more important than size, we can simplify cost evaluation by using only cardinality. Note that this enables us to not care about column pruing during reordering (the first point). Otherwise, project will influence the output size of intermediate joins. Exclude cartesian products in the "memo". This significantly reduces the search space and memory overhead of memo. Otherwise every combination of items will exist in the memo. We can find those unjoinable items after reordering is finished and put them at the end. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org