Zhenhua Wang created SPARK-19915:
------------------------------------

             Summary: Improve join reorder: simplify cost evaluation, postpone 
column pruning, exclude cartesian product
                 Key: SPARK-19915
                 URL: https://issues.apache.org/jira/browse/SPARK-19915
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
    Affects Versions: 2.2.0
            Reporter: Zhenhua Wang


Do column pruning during reordering is troublesome. We can do it right after 
reordering, then logics for adding projects on intermediate joins can be 
removed. This makes the code simpler and more reliable.
Usually cardinality is more important than size, we can simplify cost 
evaluation by using only cardinality. Note that this enables us to not care 
about column pruing during reordering (the first point). Otherwise, project 
will influence the output size of intermediate joins.
Exclude cartesian products in the "memo". This significantly reduces the search 
space and memory overhead of memo. Otherwise every combination of items will 
exist in the memo. We can find those unjoinable items after reordering is 
finished and put them at the end.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to