xiangfu0 opened a new pull request, #18662: URL: https://github.com/apache/pinot/pull/18662
## Summary Adds native **`GROUP BY GROUPING SETS (...)` / `ROLLUP(...)` / `CUBE(...)`** and the **`GROUPING(...)` / `GROUPING_ID(...)`** functions (PostgreSQL semantics) to Apache Pinot's **single-stage (v1) query engine**. Previously these constructs were unsupported in the single-stage engine: the parser flattened `GROUP BY` into a flat list, and `PinotQuery`, `QueryContext`, the group-key generators, and the broker reduce path all assumed a single flat grouping. ```sql SELECT country, city, SUM(revenue), GROUPING(city) FROM sales GROUP BY ROLLUP(country, city) ``` ## Design - A grouping-sets query is represented as the **union** of all grouping columns plus a list of **per-set bitmasks** (ROLLUP/CUBE expanded to grouping sets at parse time in `CalciteSqlParser`). - Each input row is expanded — in a **single scan** — into one group per grouping set, reusing the existing multi-value (`int[][]`) aggregation path (`GroupingSetsGroupKeyGenerator`). - A synthetic **`$grouping_id`** key column (the per-set bitmask) is appended after the union columns so rows from different sets never merge — e.g. a genuine `(a, NULL)` detail row stays distinct from a rolled-up `(a, NULL)` subtotal — and it powers `GROUPING()` / `GROUPING_ID()` at the broker. - Grouping-set key columns are serialized **null-aware regardless of the query's null-handling option**, since rolled-up columns are always NULL. - `GROUPING(args...)` is evaluated at the broker in post-aggregation by extracting the relevant bits from `$grouping_id` (works in SELECT, HAVING, and ORDER BY). - Multi-value grouping columns (Cartesian expansion) and filtered aggregations are supported. Per-segment group trim is bucketed **per grouping set**, so a global top-K cannot starve a low-magnitude set such as the grand total. ## Scope / limitations - **Single-stage engine only.** (The multi-stage engine parses grouping sets via Calcite but does not execute them correctly — pre-existing, out of scope here.) - Star-tree and `server.returnFinalResult` are bypassed for grouping-set queries; `DISTINCT` + grouping sets is rejected. ## Backward compatibility / rolling upgrade ⚠️ Adds an optional `groupingSets` field to the `PinotQuery` Thrift wire object. A **new broker** sending a grouping-sets query to a **not-yet-upgraded server** fails with an actionable error (the reducer detects the missing `$grouping_id` column) instead of returning a silently-wrong result. **Upgrade servers before brokers.** A proper server-capability negotiation is left as a follow-up. ## Testing - **Unit:** parser expansion (ROLLUP → prefixes, CUBE → power set, GROUPING SETS, mixed, dedup, grand total) and column/set-count limit rejections; per-set bucketed trim (`TableResizerTest`). - **Integration (`GroupingSetsQueriesTest`, real cluster, ≥2 segments — 14 tests):** the genuine-vs-rolled-up NULL discriminator, NULL round-trip with null handling on **and** off, INT/LONG/DOUBLE/STRING key types, GROUPING in SELECT/HAVING/ORDER BY, multi-value columns, filtered aggregations, ORDER BY on an aggregation, and a plain-`GROUP BY` regression. ## Follow-ups (not in this PR) - Server capability negotiation to harden the rolling-upgrade guard. - Dictionary-id fast path in the grouping-sets generator (performance). - Multi-stage engine support for grouping sets. ## Labels `feature`, `backward-incompat` (new optional Thrift field; servers must be upgraded before brokers), `release-notes` 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
