[ 
https://issues.apache.org/jira/browse/SPARK-26375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang, Gang updated SPARK-26375:
-------------------------------
    Description: In catalyst, some optimize rules are base on table statistics, 
like rule ReorderJoin, in which star schema is detected, and 
CostBasedJoinReorder. In these rules, statistics accuracy are crucial. While, 
currently all these rules are fired before partition pruning, which may result 
in inaccurate statistics.  (was: In catalyst, some optimize rules are base on 
table statistics, like rule ReorderJoin, in which star schema is detected, and 
CostBasedJoinReorder. In these rules, statistics accuracy are crucial. While, 
currently all these rules are fired before partition pruning, which may get 
inaccurate statistics.)

> Rule PruneFileSourcePartitions should be fired before any other rules based 
> on table statistics
> -----------------------------------------------------------------------------------------------
>
>                 Key: SPARK-26375
>                 URL: https://issues.apache.org/jira/browse/SPARK-26375
>             Project: Spark
>          Issue Type: Improvement
>          Components: Optimizer
>    Affects Versions: 2.3.0
>            Reporter: Wang, Gang
>            Priority: Major
>
> In catalyst, some optimize rules are base on table statistics, like rule 
> ReorderJoin, in which star schema is detected, and CostBasedJoinReorder. In 
> these rules, statistics accuracy are crucial. While, currently all these 
> rules are fired before partition pruning, which may result in inaccurate 
> statistics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to