[ 
https://issues.apache.org/jira/browse/SPARK-19915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhenhua Wang updated SPARK-19915:
---------------------------------
    Description: 
1. Usually cardinality is more important than size, we can simplify cost 
evaluation by using only cardinality. Note that this also enables us to not 
care about column pruing during reordering. Because otherwise, project will 
influence the output size of intermediate joins.
2. Do column pruning during reordering is troublesome. Given the first change, 
we can do it right after reordering, then logics for adding projects on 
intermediate joins can be removed. This makes the code simpler and more 
reliable.
3. Exclude cartesian products in the "memo". This significantly reduces the 
search space and memory overhead of memo. Otherwise every combination of items 
will exist in the memo. We can find those unjoinable items after reordering is 
finished and put them at the end.

  was:
Do column pruning during reordering is troublesome. We can do it right after 
reordering, then logics for adding projects on intermediate joins can be 
removed. This makes the code simpler and more reliable.
Usually cardinality is more important than size, we can simplify cost 
evaluation by using only cardinality. Note that this enables us to not care 
about column pruing during reordering (the first point). Otherwise, project 
will influence the output size of intermediate joins.
Exclude cartesian products in the "memo". This significantly reduces the search 
space and memory overhead of memo. Otherwise every combination of items will 
exist in the memo. We can find those unjoinable items after reordering is 
finished and put them at the end.


> Improve join reorder: simplify cost evaluation, postpone column pruning, 
> exclude cartesian product
> --------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-19915
>                 URL: https://issues.apache.org/jira/browse/SPARK-19915
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: Zhenhua Wang
>
> 1. Usually cardinality is more important than size, we can simplify cost 
> evaluation by using only cardinality. Note that this also enables us to not 
> care about column pruing during reordering. Because otherwise, project will 
> influence the output size of intermediate joins.
> 2. Do column pruning during reordering is troublesome. Given the first 
> change, we can do it right after reordering, then logics for adding projects 
> on intermediate joins can be removed. This makes the code simpler and more 
> reliable.
> 3. Exclude cartesian products in the "memo". This significantly reduces the 
> search space and memory overhead of memo. Otherwise every combination of 
> items will exist in the memo. We can find those unjoinable items after 
> reordering is finished and put them at the end.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to