kosiew commented on code in PR #19252:
URL: https://github.com/apache/datafusion/pull/19252#discussion_r2621665000
##########
datafusion/physical-plan/src/aggregates/mod.rs:
##########
@@ -177,6 +177,9 @@ pub struct PhysicalGroupBy {
/// expression in null_expr. If `groups[i][j]` is true, then the
/// j-th expression in the i-th group is NULL, otherwise it is `expr[j]`.
groups: Vec<Vec<bool>>,
+ /// True when GROUPING SETS/CUBE/ROLLUP are used so `__grouping_id` should
+ /// be included in the output schema.
+ has_grouping_set: bool,
Review Comment:
Thanks for the suggestion.
I think keeping a separate `has_grouping_set: bool` is actually the cleaner
and more approachable design here, for a few reasons.
**Clear intent**
The flag says exactly what it means: *should `__grouping_id` be part of the
output schema?* That intent is immediately obvious when reading the code,
instead of having to infer behavior from the state of an `Option`.
**Easier to reason about**
With an explicit boolean, the invariant is simple:
* `true` → grouping-set mode
* `false` → simple `GROUP BY`
Using an `Option` would require interpreting things like `Some(vec![])` vs
`Some(vec![vec![false]])`.
**More readable in practice**
Checks like:
```rust
if self.has_grouping_set { ... }
```
are easier to scan and understand than repeatedly pattern-matching on an
`Option`
**Fits the existing logic nicely**
The current `from_pre_group()` logic already reads very clearly:
```rust
let groups = if self.expr.is_empty() && !self.has_grouping_set {
// No GROUP BY expressions - should have no groups
vec![]
} else {
vec![vec![false; num_exprs]]
};
```
With an `Option`-based approach, this becomes harder to follow and less
explicit about *why* we’re making this distinction.
**Plays well with protobufs**
A simple `bool` maps cleanly to the proto definition and avoids extra
conversion or interpretation logic between Rust `Option`s and protobuf
semantics.
**Separation of concerns**
`groups` describes the *structure* of grouping (which expressions are used
where), while `has_grouping_set` independently captures whether grouping-set
mode is enabled at all (which affects schema and grouping ID generation).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]