xiangfu0 opened a new pull request, #18664: URL: https://github.com/apache/pinot/pull/18664
## Summary Adds SQL `GROUPING SETS` / `ROLLUP` / `CUBE` and the `GROUPING()` / `GROUPING_ID()` indicator functions, working end-to-end on **both** query engines. ### Single-stage engine (SSE) — native, single scan - New optional Thrift field `PinotQuery.groupingSetsMasks` carries the per-set participation masks over the union of group-by columns. - `CalciteSqlParser` normalizes `ROLLUP` / `CUBE` / `GROUPING SETS` (including mixed forms like `a, ROLLUP(b, c)`) into the union columns + masks, and rewrites `GROUPING(col)` / `GROUPING_ID(...)` onto a synthetic internal `$groupingId` key column. - `GroupingSetsGroupKeyGenerator` maps each row to one group per set (via the existing multi-valued group-by path), with `$groupingId` as an extra key column that keeps a rolled-up NULL from colliding with a real-data NULL. - The combine/reduce path is migrated from N to N+1 key columns, and the NULL-bitmap serialization path is forced on for rolled-up columns so they round-trip as `NULL` regardless of null-handling mode. Star-tree is disabled for these queries. ### Multi-stage engine (MSE) — UNION ALL expansion - A grouping-set `LogicalAggregate` is expanded into a `UNION ALL` of ordinary per-set aggregates (`GroupingSetsExpander`), with rolled-up columns projected as `NULL` and `GROUPING()` / `GROUPING_ID()` computed as per-branch constant literals. The multi-stage runtime therefore executes only standard `Union` / `Aggregate` / `Project` plans and needs no runtime changes. - `GROUPING` / `GROUPING_ID` registered in `PinotOperatorTable`; ROW-expression validation relaxed for the parenthesized grouping lists. ## Example ```sql SELECT country, city, SUM(sales), GROUPING(country), GROUPING(city) FROM sales GROUP BY ROLLUP(country, city) ``` returns the detail rows, per-country subtotals (`city` = NULL, `GROUPING(city)` = 1), and the grand total (`country`/`city` = NULL, `GROUPING` = 1). ## Testing - Unit tests: bit conventions, scalar functions, parser normalization + `GROUPING` rewrite, Thrift wire round-trip. - In-process server-execution + broker-reduce tests (`GroupingSetsQueriesTest`), covering ROLLUP/CUBE/GROUPING SETS, `GROUPING`/`GROUPING_ID`, HAVING, both ORDER BY paths, the empty-server schema, and the multi-valued-column rejection. - MSE planner tests (`GroupingSetsPlannerTest`) + 191 existing planner tests (no regression). - Cluster integration test (`GroupingSetsTest`) running every query on **both** engines. ## Limitations / follow-ups - Multi-valued group-by columns inside grouping sets are rejected with a clear error (single-valued only for now). - The new `groupingSetsMasks` Thrift field is `optional` and wire-compatible, but grouping-set queries require all servers to be upgraded: an un-upgraded server would ignore the field and run a plain GROUP BY. This is a new query capability (no existing-query regression) and should be noted in release notes; a broker-side min-version gate is a sensible follow-up. 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
