[PR] [MaterializedView] Add scalar function support to AggregationSubsumptionStrategy [pinot]

via GitHub Thu, 04 Jun 2026 02:42:54 -0700


hongkunxu opened a new pull request, #18681:
URL: https://github.com/apache/pinot/pull/18681


   ## Support scalar functions in AggregationSubsumptionStrategy projection 
match
   
   ### What
   
   `AggregationSubsumptionStrategy` previously assumed that **any** 
function-call
   expression in a query's SELECT list was an aggregate that required an
   `AggregationEquivalence` re-aggregation rule. As a result, plain **scalar
   grouping functions** (e.g. `DATETRUNC`, `UPPER`, `LOWER`, `SUBSTR`) were 
routed
   into the aggregation-equivalence path, found no rule, and the candidate MV 
was
   rejected at the PROJECTION stage.
   
   This PR makes the projection match distinguish *aggregate* from 
*non-aggregate*
   expressions via `CalciteSqlParser.isAggregateExpression(...)`:
   
   - **Non-aggregate** expressions (plain columns **and** scalar functions) now
     match by a direct MV projection hit — the same treatment scan queries 
already
     get.
   - **Aggregate** expressions (including nested ones like `ROUND(SUM(x))`) keep
     requiring a registered `AggregationEquivalence` rule and are rejected when 
none
     exists.
   
   The same aggregate-vs-scalar split is applied consistently in the SELECT
   rewrite and in the HAVING / ORDER BY remap, so a scalar grouping function is
   mapped straight to its MV column wherever it appears.
   
   ### Usage examples (previously missed the MV, now rewrite to it)
   
   Given an MV grouped by a scalar time bucket:
   
   ```sql
   CREATE MATERIALIZED VIEW airlineStatsTs_mv_datetrunc AS
   SELECT DATETRUNC('DAY', ts) AS dayBucket, Carrier,
          SUM(ArrDelay) AS sum_ArrDelay, COUNT(*) AS flight_count
   FROM airlineStatsTs
   GROUP BY DATETRUNC('DAY', ts), Carrier;
   ```
   
   **1. Scalar function (`DATETRUNC`) in SELECT + GROUP BY**
   
   ```sql
   SELECT DATETRUNC('DAY', ts), SUM(ArrDelay)
   FROM airlineStatsTs
   GROUP BY DATETRUNC('DAY', ts);
   ```
   
   - Before: rejected at PROJECTION (`DATETRUNC` treated as an aggregate 
needing a
     re-aggregation rule).
   - After: `DATETRUNC('DAY', ts)` is a direct MV projection hit; rewrites to 
the MV.
   
   **2. Multiple scalar functions (`UPPER` / `SUBSTR`) as grouping keys**
   
   For an MV grouped by `UPPER(Carrier)` and `SUBSTR(Origin, 0, 1)`:
   
   ```sql
   SELECT UPPER(Carrier), SUBSTR(Origin, 0, 1), COUNT(*)
   FROM airlineStatsTs
   GROUP BY UPPER(Carrier), SUBSTR(Origin, 0, 1);
   ```
   
   - Before: rejected at PROJECTION (`UPPER` / `SUBSTR` treated as aggregates).
   - After: both scalar groupings are direct MV projection hits; rewrites to 
the MV.
   
   ### Tests
   
   Added `AggregationSubsumptionStrategyTest` cases: scalar exact match, scalar 
in
   ORDER BY, scalar in HAVING, finer-granularity re-aggregation with a scalar 
key,
   negative case for a non-materialized scalar, and a `ROUND(SUM(x))` regression
   ensuring nested aggregates stay on the equivalence path.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [MaterializedView] Add scalar function support to AggregationSubsumptionStrategy [pinot]

Reply via email to