Hi Вадим, I'd like to share how the projections and filters are pushed down in the first place.
1. Firstly we should have a RelNode which can do projections and filters, and in Calcite, this is done by BindableTableScan[1]. 2. Then we need a rule to match such as Filter/Project on top of Scan, and push the filters into the Scan, and in Calcite this is done by FilterTableScanRule[2] and ProjectTableScanRule[3]. 3. Finally, we should translate the Scan with filters and/or projections to a executable form, this may be different for different projections because they have their own physical representations. In Calcite, BindableTableScan will be transformed to TableScanNode[4], which will further push filters and projections into ProjectableFilterableTable[5]. Hence, to extend Calcite to push aggregations into Scan, you need the same process. You need a physical Scan node which can do aggregations, and a rule to match Aggregate on top of Scan to push it down. Then you also need to implement the corresponding physical logics. If you want the Scan node to do all the projection/filter/aggregation pushdown, you need to be careful to deal with the mix of them, because generally they are not pushed down in one go, e.g. you may push a aggregation into a Scan which has been pushed the filters down. Hope this helps~ [1] https://github.com/apache/calcite/blob/de41df4d117041fbee042e07f70e6043f1fe626d/core/src/main/java/org/apache/calcite/interpreter/Bindables.java#L207 [2] https://github.com/apache/calcite/blob/de41df4d117041fbee042e07f70e6043f1fe626d/core/src/main/java/org/apache/calcite/rel/rules/FilterTableScanRule.java#L57 [3] https://github.com/apache/calcite/blob/de41df4d117041fbee042e07f70e6043f1fe626d/core/src/main/java/org/apache/calcite/rel/rules/ProjectTableScanRule.java#L57 [4] https://github.com/apache/calcite/blob/de41df4d117041fbee042e07f70e6043f1fe626d/core/src/main/java/org/apache/calcite/interpreter/TableScanNode.java#L63 [5] https://github.com/apache/calcite/blob/de41df4d117041fbee042e07f70e6043f1fe626d/core/src/main/java/org/apache/calcite/schema/ProjectableFilterableTable.java#L38 Вадим Ахмедов <[email protected]> 于2022年6月17日周五 16:59写道: > Hi! > > I'm modifying a driver based on Apache Calcite that works with AWS S3 > storage using SQL queries. The interaction with S3 storage uses the S3 > Select dialect which is very similar to SQL. The driver uses > ProjectableFilterableTable to scan CSV data loaded from AWS. The filters as > a list of RexNodes are used in the scan method to transform SQL queries > into AWS S3 Select queries. Thus push down of projects and filters is done > into requests to the S3 storage. > > Now I need to modify the driver in such a way that the push down of > aggregate functions additionally occurs. > > Calcite documentation has a hint: > "If you want more control, you should write a planner rule. This will allow > you to push down expressions, to make a cost-based decision about whether > to push down processing, and push down more complex operations such as > join, aggregation, and sort." > > I really need advice on how I can push down the aggregate functions with > minimal modification of the driver source code. I have to ignore the > aggregate functions in SQL somehow and push them into queries in S3 Select > so that the aggregation occurs on the S3 side and not in memory. > > If I try to replace ProjectableFilterableTable with TranslatableTable the > code will become 10 times more complicated. > > Maybe there is some simpler way to push down the aggregates? > > If TranslatableTable is the only way to solve this problem, what > minimalistic example can I use for this? > > Driver source code > https://github.com/amannm/lake-driver > > Thanks, > Vadim A. > -- Best, Benchao Li
