Thank you Jacques and Jing Zhang for the code pointers! On Mon, 13 Dec 2021 at 03:58, Jing Zhang <[email protected]> wrote:
> Hi, > partition pruner is a common requirement. > You could provide a rule which push partition evaluated by filter condition > into a TableScan. > There are two ways to ensure the converted result would be chosen in the > final plan. > 1. If use CBO planner, you could override the `getRowCount ` by merging all > the row count of matched partitions into total row count. After apply the > rule, the total row count must be less than before. > 2. You could use HBO planner which I thought is proper for this case, > because we could always get benifite from partition pruner. > > Apache Flink also provides Partitioner pruner, you could find it's > implementation here [1]. Hope it helps. > > [1] > > https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/rules/logical/PushPartitionIntoTableSourceScanRule.java > > Best, > Jing Zhang > > Jacques Nadeau <[email protected]> 于2021年12月13日周一 01:57写道: > > > I don't have a great pointer to an intelligent arbitrary filter coster. > The > > default one in Calcite isn't great (last I checked) as it considers more > > filters to be equivalent to more reduction. This means it dramatically > > overestimates set reduction. In Dremio, in simple cases we had to apply > an > > upper limit on reduction. > > > > For partitions specifically, we actually apply the condition to the > > underlying partition details to get an accurate new cost. You can see the > > base class of the pruning here: > > > > > > > https://github.com/dremio/dremio-oss/blob/master/sabot/kernel/src/main/java/com/dremio/exec/planner/logical/partition/PruneScanRuleBase.java > > > > In this code, the problem is decomposed into first doing search argument > > pruning (sarg) followed by arbitrary expression pruning (using an > > interpreter). > > > > Sorry I don't have a great pointer to a high-quality selectivity > estimator > > in OSS that uses more advanced stats. Maybe someone else can point to > one. > > > > On Sat, Dec 11, 2021 at 11:43 PM Alessandro Solimando < > > [email protected]> wrote: > > > > > Do you have any code pointer for achieving that, Jacques? > > > > > > My main concern is how to estimate the new cost. Do you leverage the > > > estimation of predicate selectivity over the partitioning expression > > maybe? > > > > > > Il Dom 12 Dic 2021, 05:48 Jacques Nadeau <[email protected]> ha > > scritto: > > > > > > > What we have done in the past is push filters into a scan and alter > the > > > > costing (and estimated row count). In cases where the filter or > > portions > > > of > > > > the filter can be applied against partitioning columns, you prune > > > > partitions and use a new row count estimate/cost estimate based on > the > > > > reduced partition set. > > > > > > > > > > > > On Fri, Dec 10, 2021 at 10:25 AM Maxim Gramin < > [email protected] > > > > > > > wrote: > > > > > > > > > I assume that some of the filter conditions (which are involved in > > the > > > > > choice of partitions ) may by pushdown'ed to TableScan > > > > > > > > > > On Fri, Dec 10, 2021 at 7:29 PM Константин Новиков > > > > > <[email protected]> wrote: > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > Given some partitioned storage, we can omit the scan of some > > > partitions > > > > > > when a filter is present. How can the lower cost of the scan be > > > > > > represented? As far as I can tell the current approach only > allows > > > > > > providing a single cost for the TableScan and Filter can only add > > to > > > > > > that. Should my implementation provide a rule that combines > > > > > > Filter+TableScan? > > > > > > > > > > > > > > > > > > > > >
