Zhenhua Wang created SPARK-19915:
------------------------------------
Summary: Improve join reorder: simplify cost evaluation, postpone
column pruning, exclude cartesian product
Key: SPARK-19915
URL: https://issues.apache.org/jira/browse/SPARK-19915
Project: Spark
Issue Type: Sub-task
Components: SQL
Affects Versions: 2.2.0
Reporter: Zhenhua Wang
Do column pruning during reordering is troublesome. We can do it right after
reordering, then logics for adding projects on intermediate joins can be
removed. This makes the code simpler and more reliable.
Usually cardinality is more important than size, we can simplify cost
evaluation by using only cardinality. Note that this enables us to not care
about column pruing during reordering (the first point). Otherwise, project
will influence the output size of intermediate joins.
Exclude cartesian products in the "memo". This significantly reduces the search
space and memory overhead of memo. Otherwise every combination of items will
exist in the memo. We can find those unjoinable items after reordering is
finished and put them at the end.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]