Hi Вадим,

I'd like to share how the projections and filters are pushed down
in the first place.

1. Firstly we should have a RelNode which can do projections and
filters, and in Calcite, this is done by BindableTableScan[1].
2. Then we need a rule to match such as Filter/Project on top of Scan,
and push the filters into the Scan, and in Calcite this is done
by FilterTableScanRule[2] and ProjectTableScanRule[3].
3. Finally, we should translate the Scan with filters and/or projections
to a executable form, this may be different for different projections
because they have their own physical representations. In Calcite,
BindableTableScan will be transformed to TableScanNode[4], which
will further push filters and projections into
ProjectableFilterableTable[5].

Hence, to extend Calcite to push aggregations into Scan, you need
the same process. You need a physical Scan node which can do aggregations,
and a rule to match Aggregate on top of Scan to push it down. Then you also
need to implement the corresponding physical logics.

If you want the Scan node to do all the projection/filter/aggregation
pushdown,
you need to be careful to deal with the mix of them, because generally they
are not pushed down in one go, e.g. you may push a aggregation into a Scan
which has been pushed the filters down.

Hope this helps~

[1]
https://github.com/apache/calcite/blob/de41df4d117041fbee042e07f70e6043f1fe626d/core/src/main/java/org/apache/calcite/interpreter/Bindables.java#L207
[2]
https://github.com/apache/calcite/blob/de41df4d117041fbee042e07f70e6043f1fe626d/core/src/main/java/org/apache/calcite/rel/rules/FilterTableScanRule.java#L57
[3]
https://github.com/apache/calcite/blob/de41df4d117041fbee042e07f70e6043f1fe626d/core/src/main/java/org/apache/calcite/rel/rules/ProjectTableScanRule.java#L57
[4]
https://github.com/apache/calcite/blob/de41df4d117041fbee042e07f70e6043f1fe626d/core/src/main/java/org/apache/calcite/interpreter/TableScanNode.java#L63
[5]
https://github.com/apache/calcite/blob/de41df4d117041fbee042e07f70e6043f1fe626d/core/src/main/java/org/apache/calcite/schema/ProjectableFilterableTable.java#L38

Вадим Ахмедов <[email protected]> 于2022年6月17日周五 16:59写道:

> Hi!
>
> I'm modifying a driver based on Apache Calcite that works with AWS S3
> storage using SQL queries. The interaction with S3 storage uses the S3
> Select dialect which is very similar to SQL. The driver uses
> ProjectableFilterableTable to scan CSV data loaded from AWS. The filters as
> a list of RexNodes are used in the scan method to transform SQL queries
> into AWS S3 Select queries. Thus push down of projects and filters is done
> into requests to the S3 storage.
>
> Now I need to modify the driver in such a way that the push down of
> aggregate functions additionally occurs.
>
> Calcite documentation has a hint:
> "If you want more control, you should write a planner rule. This will allow
> you to push down expressions, to make a cost-based decision about whether
> to push down processing, and push down more complex operations such as
> join, aggregation, and sort."
>
> I really need advice on how I can push down the aggregate functions with
> minimal modification of the driver source code. I have to ignore the
> aggregate functions in SQL somehow and push them into queries in S3 Select
> so that the aggregation occurs on the S3 side and not in memory.
>
> If I try to replace ProjectableFilterableTable with TranslatableTable the
> code will become 10 times more complicated.
>
> Maybe there is some simpler way to push down the aggregates?
>
> If TranslatableTable is the only way to solve this problem, what
> minimalistic example can I use for this?
>
> Driver source code
> https://github.com/amannm/lake-driver
>
> Thanks,
> Vadim A.
>


-- 

Best,
Benchao Li

Reply via email to