Hi!

I'm modifying a driver based on Apache Calcite that works with AWS S3
storage using SQL queries. The interaction with S3 storage uses the S3
Select dialect which is very similar to SQL. The driver uses
ProjectableFilterableTable to scan CSV data loaded from AWS. The filters as
a list of RexNodes are used in the scan method to transform SQL queries
into AWS S3 Select queries. Thus push down of projects and filters is done
into requests to the S3 storage.

Now I need to modify the driver in such a way that the push down of
aggregate functions additionally occurs.

Calcite documentation has a hint:
"If you want more control, you should write a planner rule. This will allow
you to push down expressions, to make a cost-based decision about whether
to push down processing, and push down more complex operations such as
join, aggregation, and sort."

I really need advice on how I can push down the aggregate functions with
minimal modification of the driver source code. I have to ignore the
aggregate functions in SQL somehow and push them into queries in S3 Select
so that the aggregation occurs on the S3 side and not in memory.

If I try to replace ProjectableFilterableTable with TranslatableTable the
code will become 10 times more complicated.

Maybe there is some simpler way to push down the aggregates?

If TranslatableTable is the only way to solve this problem, what
minimalistic example can I use for this?

Driver source code
https://github.com/amannm/lake-driver

Thanks,
Vadim A.

Reply via email to