jverhoeks opened a new issue, #21570:
URL: https://github.com/apache/datafusion/issues/21570

   **Describe the bug**
   
   When `GROUP BY ROLLUP` is used with a query that produces 0 input rows, 
DataFusion returns 0 output rows. The SQL standard requires ROLLUP to always 
produce the grand total row (with NULL grouping keys and aggregate results 
computed over the empty set).
   
   **To Reproduce**
   
   ```sql
   CREATE TABLE test (category VARCHAR, value INT) AS VALUES ('a', 1);
   
   -- With data: works correctly (returns 2 rows: group + grand total)
   SELECT category, count(*) FROM test GROUP BY ROLLUP(category);
   -- Returns: [('a', 1), (NULL, 1)]  ✅ correct
   
   -- With empty input: returns 0 rows instead of 1
   SELECT category, count(*) FROM test WHERE value < 0 GROUP BY 
ROLLUP(category);
   -- Returns: []  ❌ should return [(NULL, 0)]
   
   -- Without GROUP BY, empty input correctly returns 1 row:
   SELECT count(*) FROM test WHERE value < 0;
   -- Returns: [(0)]  ✅ correct
   ```
   
   **Expected behavior**
   
   Per SQL standard (and PostgreSQL, Trino, Oracle behavior):
   
   ```sql
   SELECT category, count(*) FROM test WHERE value < 0 GROUP BY 
ROLLUP(category);
   -- Should return: [(NULL, 0)]
   ```
   
   The grand total grouping set (the empty grouping set that ROLLUP always 
includes) should produce a row even when there are zero input rows, because 
aggregate functions like `count(*)` have well-defined behavior on empty sets 
(returns 0).
   
   **Additional context**
   
   - Tested on DataFusion 52.3.0
   - Same behavior with `CUBE` and explicit `GROUPING SETS((), (category))`
   - Regular `GROUP BY category` correctly returns 0 rows with empty input (no 
grand total expected)
   - This affects 6 TPC-DS queries (q18, q27, q36, q67, q70, q86) at small 
scale factors where certain joins produce 0 matching rows
   
   **Root cause (likely)**
   
   DataFusion's grouped aggregation implementation skips producing output 
batches when the hash table is empty (no input rows). For regular `GROUP BY`, 
this is correct. For `ROLLUP`/`CUBE`/`GROUPING SETS` that include the empty 
grouping set `()`, a row should always be produced for that set.
   
   The fix would need to check if any grouping set is the empty set, and if so, 
produce a row with NULL grouping keys and empty-set aggregate values (count=0, 
sum=NULL, etc.) even when the hash table has no entries.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to