Hi, partition pruner is a common requirement. You could provide a rule which push partition evaluated by filter condition into a TableScan. There are two ways to ensure the converted result would be chosen in the final plan. 1. If use CBO planner, you could override the `getRowCount ` by merging all the row count of matched partitions into total row count. After apply the rule, the total row count must be less than before. 2. You could use HBO planner which I thought is proper for this case, because we could always get benifite from partition pruner.
Apache Flink also provides Partitioner pruner, you could find it's implementation here [1]. Hope it helps. [1] https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/rules/logical/PushPartitionIntoTableSourceScanRule.java Best, Jing Zhang Jacques Nadeau <[email protected]> 于2021年12月13日周一 01:57写道: > I don't have a great pointer to an intelligent arbitrary filter coster. The > default one in Calcite isn't great (last I checked) as it considers more > filters to be equivalent to more reduction. This means it dramatically > overestimates set reduction. In Dremio, in simple cases we had to apply an > upper limit on reduction. > > For partitions specifically, we actually apply the condition to the > underlying partition details to get an accurate new cost. You can see the > base class of the pruning here: > > > https://github.com/dremio/dremio-oss/blob/master/sabot/kernel/src/main/java/com/dremio/exec/planner/logical/partition/PruneScanRuleBase.java > > In this code, the problem is decomposed into first doing search argument > pruning (sarg) followed by arbitrary expression pruning (using an > interpreter). > > Sorry I don't have a great pointer to a high-quality selectivity estimator > in OSS that uses more advanced stats. Maybe someone else can point to one. > > On Sat, Dec 11, 2021 at 11:43 PM Alessandro Solimando < > [email protected]> wrote: > > > Do you have any code pointer for achieving that, Jacques? > > > > My main concern is how to estimate the new cost. Do you leverage the > > estimation of predicate selectivity over the partitioning expression > maybe? > > > > Il Dom 12 Dic 2021, 05:48 Jacques Nadeau <[email protected]> ha > scritto: > > > > > What we have done in the past is push filters into a scan and alter the > > > costing (and estimated row count). In cases where the filter or > portions > > of > > > the filter can be applied against partitioning columns, you prune > > > partitions and use a new row count estimate/cost estimate based on the > > > reduced partition set. > > > > > > > > > On Fri, Dec 10, 2021 at 10:25 AM Maxim Gramin <[email protected] > > > > > wrote: > > > > > > > I assume that some of the filter conditions (which are involved in > the > > > > choice of partitions ) may by pushdown'ed to TableScan > > > > > > > > On Fri, Dec 10, 2021 at 7:29 PM Константин Новиков > > > > <[email protected]> wrote: > > > > > > > > > > > > > > Hi, > > > > > > > > > > Given some partitioned storage, we can omit the scan of some > > partitions > > > > > when a filter is present. How can the lower cost of the scan be > > > > > represented? As far as I can tell the current approach only allows > > > > > providing a single cost for the TableScan and Filter can only add > to > > > > > that. Should my implementation provide a rule that combines > > > > > Filter+TableScan? > > > > > > > > > > > > > > >
