Hi! I'm modifying a driver based on Apache Calcite that works with AWS S3 storage using SQL queries. The interaction with S3 storage uses the S3 Select dialect which is very similar to SQL. The driver uses ProjectableFilterableTable to scan CSV data loaded from AWS. The filters as a list of RexNodes are used in the scan method to transform SQL queries into AWS S3 Select queries. Thus push down of projects and filters is done into requests to the S3 storage.
Now I need to modify the driver in such a way that the push down of aggregate functions additionally occurs. Calcite documentation has a hint: "If you want more control, you should write a planner rule. This will allow you to push down expressions, to make a cost-based decision about whether to push down processing, and push down more complex operations such as join, aggregation, and sort." I really need advice on how I can push down the aggregate functions with minimal modification of the driver source code. I have to ignore the aggregate functions in SQL somehow and push them into queries in S3 Select so that the aggregation occurs on the S3 side and not in memory. If I try to replace ProjectableFilterableTable with TranslatableTable the code will become 10 times more complicated. Maybe there is some simpler way to push down the aggregates? If TranslatableTable is the only way to solve this problem, what minimalistic example can I use for this? Driver source code https://github.com/amannm/lake-driver Thanks, Vadim A.
