Wang, Gang created SPARK-26375:
----------------------------------
Summary: Rule PruneFileSourcePartitions should be fired before any
other rules based on data size
Key: SPARK-26375
URL: https://issues.apache.org/jira/browse/SPARK-26375
Project: Spark
Issue Type: Improvement
Components: Optimizer
Affects Versions: 2.3.0
Reporter: Wang, Gang
In catalyst, some optimize rules are base on table statistics, like rule
ReorderJoin, in which star schema is detected, and CostBasedJoinReorder. In
these rules, statistics accuracy are crucial. While, currently all these rules
are fired before partition pruning, which may get inaccurate statistics.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]