I don't have a great pointer to an intelligent arbitrary filter coster. The default one in Calcite isn't great (last I checked) as it considers more filters to be equivalent to more reduction. This means it dramatically overestimates set reduction. In Dremio, in simple cases we had to apply an upper limit on reduction.
For partitions specifically, we actually apply the condition to the underlying partition details to get an accurate new cost. You can see the base class of the pruning here: https://github.com/dremio/dremio-oss/blob/master/sabot/kernel/src/main/java/com/dremio/exec/planner/logical/partition/PruneScanRuleBase.java In this code, the problem is decomposed into first doing search argument pruning (sarg) followed by arbitrary expression pruning (using an interpreter). Sorry I don't have a great pointer to a high-quality selectivity estimator in OSS that uses more advanced stats. Maybe someone else can point to one. On Sat, Dec 11, 2021 at 11:43 PM Alessandro Solimando < [email protected]> wrote: > Do you have any code pointer for achieving that, Jacques? > > My main concern is how to estimate the new cost. Do you leverage the > estimation of predicate selectivity over the partitioning expression maybe? > > Il Dom 12 Dic 2021, 05:48 Jacques Nadeau <[email protected]> ha scritto: > > > What we have done in the past is push filters into a scan and alter the > > costing (and estimated row count). In cases where the filter or portions > of > > the filter can be applied against partitioning columns, you prune > > partitions and use a new row count estimate/cost estimate based on the > > reduced partition set. > > > > > > On Fri, Dec 10, 2021 at 10:25 AM Maxim Gramin <[email protected]> > > wrote: > > > > > I assume that some of the filter conditions (which are involved in the > > > choice of partitions ) may by pushdown'ed to TableScan > > > > > > On Fri, Dec 10, 2021 at 7:29 PM Константин Новиков > > > <[email protected]> wrote: > > > > > > > > > > > Hi, > > > > > > > > Given some partitioned storage, we can omit the scan of some > partitions > > > > when a filter is present. How can the lower cost of the scan be > > > > represented? As far as I can tell the current approach only allows > > > > providing a single cost for the TableScan and Filter can only add to > > > > that. Should my implementation provide a rule that combines > > > > Filter+TableScan? > > > > > > > > > >
