xiangfu0 opened a new pull request, #18664:
URL: https://github.com/apache/pinot/pull/18664

   ## Summary
   
   Adds SQL `GROUPING SETS` / `ROLLUP` / `CUBE` and the `GROUPING()` / 
`GROUPING_ID()` indicator functions, working end-to-end on **both** query 
engines.
   
   ### Single-stage engine (SSE) — native, single scan
   - New optional Thrift field `PinotQuery.groupingSetsMasks` carries the 
per-set participation masks over the union of group-by columns.
   - `CalciteSqlParser` normalizes `ROLLUP` / `CUBE` / `GROUPING SETS` 
(including mixed forms like `a, ROLLUP(b, c)`) into the union columns + masks, 
and rewrites `GROUPING(col)` / `GROUPING_ID(...)` onto a synthetic internal 
`$groupingId` key column.
   - `GroupingSetsGroupKeyGenerator` maps each row to one group per set (via 
the existing multi-valued group-by path), with `$groupingId` as an extra key 
column that keeps a rolled-up NULL from colliding with a real-data NULL.
   - The combine/reduce path is migrated from N to N+1 key columns, and the 
NULL-bitmap serialization path is forced on for rolled-up columns so they 
round-trip as `NULL` regardless of null-handling mode. Star-tree is disabled 
for these queries.
   
   ### Multi-stage engine (MSE) — UNION ALL expansion
   - A grouping-set `LogicalAggregate` is expanded into a `UNION ALL` of 
ordinary per-set aggregates (`GroupingSetsExpander`), with rolled-up columns 
projected as `NULL` and `GROUPING()` / `GROUPING_ID()` computed as per-branch 
constant literals. The multi-stage runtime therefore executes only standard 
`Union` / `Aggregate` / `Project` plans and needs no runtime changes.
   - `GROUPING` / `GROUPING_ID` registered in `PinotOperatorTable`; 
ROW-expression validation relaxed for the parenthesized grouping lists.
   
   ## Example
   ```sql
   SELECT country, city, SUM(sales), GROUPING(country), GROUPING(city)
   FROM   sales
   GROUP  BY ROLLUP(country, city)
   ```
   returns the detail rows, per-country subtotals (`city` = NULL, 
`GROUPING(city)` = 1), and the grand total (`country`/`city` = NULL, `GROUPING` 
= 1).
   
   ## Testing
   - Unit tests: bit conventions, scalar functions, parser normalization + 
`GROUPING` rewrite, Thrift wire round-trip.
   - In-process server-execution + broker-reduce tests 
(`GroupingSetsQueriesTest`), covering ROLLUP/CUBE/GROUPING SETS, 
`GROUPING`/`GROUPING_ID`, HAVING, both ORDER BY paths, the empty-server schema, 
and the multi-valued-column rejection.
   - MSE planner tests (`GroupingSetsPlannerTest`) + 191 existing planner tests 
(no regression).
   - Cluster integration test (`GroupingSetsTest`) running every query on 
**both** engines.
   
   ## Limitations / follow-ups
   - Multi-valued group-by columns inside grouping sets are rejected with a 
clear error (single-valued only for now).
   - The new `groupingSetsMasks` Thrift field is `optional` and 
wire-compatible, but grouping-set queries require all servers to be upgraded: 
an un-upgraded server would ignore the field and run a plain GROUP BY. This is 
a new query capability (no existing-query regression) and should be noted in 
release notes; a broker-side min-version gate is a sensible follow-up.
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to