[
https://issues.apache.org/jira/browse/SPARK-26375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wang, Gang updated SPARK-26375:
-------------------------------
Description: In catalyst, some optimize rules are base on table statistics,
like rule ReorderJoin, in which star schema is detected, and
CostBasedJoinReorder. In these rules, statistics accuracy are crucial. While,
currently all these rules are fired before partition pruning, which may result
in inaccurate statistics. (was: In catalyst, some optimize rules are base on
table statistics, like rule ReorderJoin, in which star schema is detected, and
CostBasedJoinReorder. In these rules, statistics accuracy are crucial. While,
currently all these rules are fired before partition pruning, which may get
inaccurate statistics.)
> Rule PruneFileSourcePartitions should be fired before any other rules based
> on table statistics
> -----------------------------------------------------------------------------------------------
>
> Key: SPARK-26375
> URL: https://issues.apache.org/jira/browse/SPARK-26375
> Project: Spark
> Issue Type: Improvement
> Components: Optimizer
> Affects Versions: 2.3.0
> Reporter: Wang, Gang
> Priority: Major
>
> In catalyst, some optimize rules are base on table statistics, like rule
> ReorderJoin, in which star schema is detected, and CostBasedJoinReorder. In
> these rules, statistics accuracy are crucial. While, currently all these
> rules are fired before partition pruning, which may result in inaccurate
> statistics.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]