qianzii2 opened a new issue, #22505:
URL: https://github.com/apache/datafusion/issues/22505

   Hi everyone, I'm very new to DataFusion (and databases in general — I come 
from a healthcare background, learned SQL from Excel, and only started touching 
Rust recently). I have an idea for a SQL feature, and I've been banging my head 
against it for a while. I'm hoping someone can tell me where I'm fundamentally 
wrong, or point me to a simpler path.
   
   ### The idea: EXBY
   
   I often want to compute aggregates at multiple granularities in one query. 
For example:
   
   ```sql
   -- I want SUM(c) by (a,b), and also SUM(c) by (b) only
   -- Currently I have to self-join, which scans twice
   ```
   
   So I thought of this syntax:
   
   ```sql
   SELECT a, b, SUM(c), EXBY(SUM(c), a) FROM t GROUP BY a, b;
   ```
   
   `EXBY(SUM(c), a)` means: ignore column `a` when grouping, only group by the 
remaining columns `(b)`.
   
   ### What I've tried (and where I got stuck)
   
   I've tried several approaches. All of them either don't work, or work but 
defeat the purpose. The core problem seems to be **output alignment** — how to 
combine two aggregation results with different granularities into a single row.
   
   **Approach 1: SQL rewrite (the only one that "works")**
   
   I rewrote the EXBY query into an equivalent JOIN:
   
   ```
   SELECT main.a, main.b, main.sum_c, exby.exby_sum
   FROM (SELECT a, b, SUM(c) FROM t GROUP BY a, b) main
   JOIN (SELECT a, SUM(c) FROM t GROUP BY a) exby
   ON main.a = exby.a
   ```
   
   This is functionally correct, and I put it into my CLI tool. But it scans 
the table twice, which defeats the original purpose (avoiding redundant scans). 
So this is just a functional hack, not a real solution.
   
   **Approach 2: Two accumulators inside one AggregateExec**
   
   My thinking: the main aggregation and the EXBY aggregation share the same 
input stream. Why not just run **two accumulators** in one `AggregateExec`?
   
   ```
   Input batches arrive
   ├─ Main accumulator: update_batch() → groups by (a, b)
   └─ EXBY accumulator: update_batch() → groups by (b) only
   ```
   
   One scan, no JOIN. Seemed obvious.
   
   But I got stuck on the **emit side**. When the main accumulator emits a 
batch, the EXBY accumulator also needs to emit matching values. The problem: 
one EXBY group maps to multiple main groups (e.g., `b=1` maps to both `(1,1)` 
and `(2,1)`). The two accumulators don't have a one-to-one correspondence in 
their output indices.
   
   I dug into `row_hash.rs` and `GroupOrdering`, trying to figure out how to 
align the two outputs, but got completely lost in the state machine. I suspect 
the current emit path assumes one set of groups, not two sets with different 
cardinalities that need to be aligned, but I might be wrong.
   
   **Approach 3: Synchronized EmitTo**
   
   I also thought about using `EmitTo::First(n)` to synchronize the emissions: 
when the main group `(a=1,b=1)` is complete, the EXBY group `(b=1)` might also 
be complete (if the input is sorted by `(a,b)`, then `(b)` is a prefix and 
should also be complete). So both could emit together, and the results could be 
zipped.
   
   But I couldn't figure out how to make this work in practice. The EXBY 
accumulator has fewer groups than the main accumulator, so `EmitTo::First(n)` 
on the main side doesn't map cleanly to `EmitTo::First(m)` on the EXBY side. 
And when the input is not sorted, this whole idea falls apart.
   
   Again, I got lost in the `GroupedHashAggregateStream` internals and couldn't 
make it work.
   
   **Approach 4: The GROUPING SETS confusion**
   
   What really confuses me is that `GROUPING SETS` can compute multiple 
aggregation levels in one scan:
   
   ```sql
   SELECT a, b, SUM(c) FROM t GROUP BY GROUPING SETS ((a,b), (b));
   ```
   
   This does almost exactly what I want — it computes both the `(a,b)` and 
`(b)` aggregations in a single pass. So the engine *can* handle multiple 
grouping granularities.
   
   But GROUPING SETS outputs them as **separate rows**, whereas I need them 
**combined into the same row**. It feels like the data is there, but I can't 
figure out how to stitch it back together. Is the "combine into the same row" 
requirement the fundamental blocker?
   
   ### My question (or plea for guidance)
   
   I've spent a lot of time on this and have to admit I'm out of my depth. I 
don't have a clean proposal or PR. I'm just really curious:
   
   - Is the "two accumulators" idea fundamentally flawed, or is it possible but 
just beyond my current ability?
   - Is the output alignment problem (one EXBY group → multiple main rows) 
something that can be solved within the current `AggregateExec` architecture, 
or does it require a new operator?
   - Are there any similar features or past discussions I can learn from? I've 
looked at `retract_batch` and `EmitTo` but they seem designed for different use 
cases.
   
   Even a "this won't work because X, go read Y" would be incredibly helpful. 
I'm here to learn.
   
   Sorry for the long post, and thanks for reading.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to