hongkunxu opened a new pull request, #18681:
URL: https://github.com/apache/pinot/pull/18681
## Support scalar functions in AggregationSubsumptionStrategy projection
match
### What
`AggregationSubsumptionStrategy` previously assumed that **any**
function-call
expression in a query's SELECT list was an aggregate that required an
`AggregationEquivalence` re-aggregation rule. As a result, plain **scalar
grouping functions** (e.g. `DATETRUNC`, `UPPER`, `LOWER`, `SUBSTR`) were
routed
into the aggregation-equivalence path, found no rule, and the candidate MV
was
rejected at the PROJECTION stage.
This PR makes the projection match distinguish *aggregate* from
*non-aggregate*
expressions via `CalciteSqlParser.isAggregateExpression(...)`:
- **Non-aggregate** expressions (plain columns **and** scalar functions) now
match by a direct MV projection hit — the same treatment scan queries
already
get.
- **Aggregate** expressions (including nested ones like `ROUND(SUM(x))`) keep
requiring a registered `AggregationEquivalence` rule and are rejected when
none
exists.
The same aggregate-vs-scalar split is applied consistently in the SELECT
rewrite and in the HAVING / ORDER BY remap, so a scalar grouping function is
mapped straight to its MV column wherever it appears.
### Usage examples (previously missed the MV, now rewrite to it)
Given an MV grouped by a scalar time bucket:
```sql
CREATE MATERIALIZED VIEW airlineStatsTs_mv_datetrunc AS
SELECT DATETRUNC('DAY', ts) AS dayBucket, Carrier,
SUM(ArrDelay) AS sum_ArrDelay, COUNT(*) AS flight_count
FROM airlineStatsTs
GROUP BY DATETRUNC('DAY', ts), Carrier;
```
**1. Scalar function (`DATETRUNC`) in SELECT + GROUP BY**
```sql
SELECT DATETRUNC('DAY', ts), SUM(ArrDelay)
FROM airlineStatsTs
GROUP BY DATETRUNC('DAY', ts);
```
- Before: rejected at PROJECTION (`DATETRUNC` treated as an aggregate
needing a
re-aggregation rule).
- After: `DATETRUNC('DAY', ts)` is a direct MV projection hit; rewrites to
the MV.
**2. Multiple scalar functions (`UPPER` / `SUBSTR`) as grouping keys**
For an MV grouped by `UPPER(Carrier)` and `SUBSTR(Origin, 0, 1)`:
```sql
SELECT UPPER(Carrier), SUBSTR(Origin, 0, 1), COUNT(*)
FROM airlineStatsTs
GROUP BY UPPER(Carrier), SUBSTR(Origin, 0, 1);
```
- Before: rejected at PROJECTION (`UPPER` / `SUBSTR` treated as aggregates).
- After: both scalar groupings are direct MV projection hits; rewrites to
the MV.
### Tests
Added `AggregationSubsumptionStrategyTest` cases: scalar exact match, scalar
in
ORDER BY, scalar in HAVING, finer-granularity re-aggregation with a scalar
key,
negative case for a non-materialized scalar, and a `ROUND(SUM(x))` regression
ensuring nested aggregates stay on the equivalence path.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]