[
https://issues.apache.org/jira/browse/SPARK-19915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhenhua Wang updated SPARK-19915:
---------------------------------
Description:
1. Usually cardinality is more important than size, we can simplify cost
evaluation by using only cardinality. Note that this also enables us to not
care about column pruing during reordering. Because otherwise, project will
influence the output size of intermediate joins.
2. Do column pruning during reordering is troublesome. Given the first change,
we can do it right after reordering, then logics for adding projects on
intermediate joins can be removed. This makes the code simpler and more
reliable.
3. Exclude cartesian products in the "memo". This significantly reduces the
search space and memory overhead of memo. Otherwise every combination of items
will exist in the memo. We can find those unjoinable items after reordering is
finished and put them at the end.
was:
Do column pruning during reordering is troublesome. We can do it right after
reordering, then logics for adding projects on intermediate joins can be
removed. This makes the code simpler and more reliable.
Usually cardinality is more important than size, we can simplify cost
evaluation by using only cardinality. Note that this enables us to not care
about column pruing during reordering (the first point). Otherwise, project
will influence the output size of intermediate joins.
Exclude cartesian products in the "memo". This significantly reduces the search
space and memory overhead of memo. Otherwise every combination of items will
exist in the memo. We can find those unjoinable items after reordering is
finished and put them at the end.
> Improve join reorder: simplify cost evaluation, postpone column pruning,
> exclude cartesian product
> --------------------------------------------------------------------------------------------------
>
> Key: SPARK-19915
> URL: https://issues.apache.org/jira/browse/SPARK-19915
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 2.2.0
> Reporter: Zhenhua Wang
>
> 1. Usually cardinality is more important than size, we can simplify cost
> evaluation by using only cardinality. Note that this also enables us to not
> care about column pruing during reordering. Because otherwise, project will
> influence the output size of intermediate joins.
> 2. Do column pruning during reordering is troublesome. Given the first
> change, we can do it right after reordering, then logics for adding projects
> on intermediate joins can be removed. This makes the code simpler and more
> reliable.
> 3. Exclude cartesian products in the "memo". This significantly reduces the
> search space and memory overhead of memo. Otherwise every combination of
> items will exist in the memo. We can find those unjoinable items after
> reordering is finished and put them at the end.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]