jverhoeks opened a new issue, #21570:
URL: https://github.com/apache/datafusion/issues/21570
**Describe the bug**
When `GROUP BY ROLLUP` is used with a query that produces 0 input rows,
DataFusion returns 0 output rows. The SQL standard requires ROLLUP to always
produce the grand total row (with NULL grouping keys and aggregate results
computed over the empty set).
**To Reproduce**
```sql
CREATE TABLE test (category VARCHAR, value INT) AS VALUES ('a', 1);
-- With data: works correctly (returns 2 rows: group + grand total)
SELECT category, count(*) FROM test GROUP BY ROLLUP(category);
-- Returns: [('a', 1), (NULL, 1)] ✅ correct
-- With empty input: returns 0 rows instead of 1
SELECT category, count(*) FROM test WHERE value < 0 GROUP BY
ROLLUP(category);
-- Returns: [] ❌ should return [(NULL, 0)]
-- Without GROUP BY, empty input correctly returns 1 row:
SELECT count(*) FROM test WHERE value < 0;
-- Returns: [(0)] ✅ correct
```
**Expected behavior**
Per SQL standard (and PostgreSQL, Trino, Oracle behavior):
```sql
SELECT category, count(*) FROM test WHERE value < 0 GROUP BY
ROLLUP(category);
-- Should return: [(NULL, 0)]
```
The grand total grouping set (the empty grouping set that ROLLUP always
includes) should produce a row even when there are zero input rows, because
aggregate functions like `count(*)` have well-defined behavior on empty sets
(returns 0).
**Additional context**
- Tested on DataFusion 52.3.0
- Same behavior with `CUBE` and explicit `GROUPING SETS((), (category))`
- Regular `GROUP BY category` correctly returns 0 rows with empty input (no
grand total expected)
- This affects 6 TPC-DS queries (q18, q27, q36, q67, q70, q86) at small
scale factors where certain joins produce 0 matching rows
**Root cause (likely)**
DataFusion's grouped aggregation implementation skips producing output
batches when the hash table is empty (no input rows). For regular `GROUP BY`,
this is correct. For `ROLLUP`/`CUBE`/`GROUPING SETS` that include the empty
grouping set `()`, a row should always be produced for that set.
The fix would need to check if any grouping set is the empty set, and if so,
produce a row with NULL grouping keys and empty-set aggregate values (count=0,
sum=NULL, etc.) even when the hash table has no entries.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]