qianzii2 opened a new issue, #22505: URL: https://github.com/apache/datafusion/issues/22505
Hi everyone, I'm very new to DataFusion (and databases in general — I come from a healthcare background, learned SQL from Excel, and only started touching Rust recently). I have an idea for a SQL feature, and I've been banging my head against it for a while. I'm hoping someone can tell me where I'm fundamentally wrong, or point me to a simpler path. ### The idea: EXBY I often want to compute aggregates at multiple granularities in one query. For example: ```sql -- I want SUM(c) by (a,b), and also SUM(c) by (b) only -- Currently I have to self-join, which scans twice ``` So I thought of this syntax: ```sql SELECT a, b, SUM(c), EXBY(SUM(c), a) FROM t GROUP BY a, b; ``` `EXBY(SUM(c), a)` means: ignore column `a` when grouping, only group by the remaining columns `(b)`. ### What I've tried (and where I got stuck) I've tried several approaches. All of them either don't work, or work but defeat the purpose. The core problem seems to be **output alignment** — how to combine two aggregation results with different granularities into a single row. **Approach 1: SQL rewrite (the only one that "works")** I rewrote the EXBY query into an equivalent JOIN: ``` SELECT main.a, main.b, main.sum_c, exby.exby_sum FROM (SELECT a, b, SUM(c) FROM t GROUP BY a, b) main JOIN (SELECT a, SUM(c) FROM t GROUP BY a) exby ON main.a = exby.a ``` This is functionally correct, and I put it into my CLI tool. But it scans the table twice, which defeats the original purpose (avoiding redundant scans). So this is just a functional hack, not a real solution. **Approach 2: Two accumulators inside one AggregateExec** My thinking: the main aggregation and the EXBY aggregation share the same input stream. Why not just run **two accumulators** in one `AggregateExec`? ``` Input batches arrive ├─ Main accumulator: update_batch() → groups by (a, b) └─ EXBY accumulator: update_batch() → groups by (b) only ``` One scan, no JOIN. Seemed obvious. But I got stuck on the **emit side**. When the main accumulator emits a batch, the EXBY accumulator also needs to emit matching values. The problem: one EXBY group maps to multiple main groups (e.g., `b=1` maps to both `(1,1)` and `(2,1)`). The two accumulators don't have a one-to-one correspondence in their output indices. I dug into `row_hash.rs` and `GroupOrdering`, trying to figure out how to align the two outputs, but got completely lost in the state machine. I suspect the current emit path assumes one set of groups, not two sets with different cardinalities that need to be aligned, but I might be wrong. **Approach 3: Synchronized EmitTo** I also thought about using `EmitTo::First(n)` to synchronize the emissions: when the main group `(a=1,b=1)` is complete, the EXBY group `(b=1)` might also be complete (if the input is sorted by `(a,b)`, then `(b)` is a prefix and should also be complete). So both could emit together, and the results could be zipped. But I couldn't figure out how to make this work in practice. The EXBY accumulator has fewer groups than the main accumulator, so `EmitTo::First(n)` on the main side doesn't map cleanly to `EmitTo::First(m)` on the EXBY side. And when the input is not sorted, this whole idea falls apart. Again, I got lost in the `GroupedHashAggregateStream` internals and couldn't make it work. **Approach 4: The GROUPING SETS confusion** What really confuses me is that `GROUPING SETS` can compute multiple aggregation levels in one scan: ```sql SELECT a, b, SUM(c) FROM t GROUP BY GROUPING SETS ((a,b), (b)); ``` This does almost exactly what I want — it computes both the `(a,b)` and `(b)` aggregations in a single pass. So the engine *can* handle multiple grouping granularities. But GROUPING SETS outputs them as **separate rows**, whereas I need them **combined into the same row**. It feels like the data is there, but I can't figure out how to stitch it back together. Is the "combine into the same row" requirement the fundamental blocker? ### My question (or plea for guidance) I've spent a lot of time on this and have to admit I'm out of my depth. I don't have a clean proposal or PR. I'm just really curious: - Is the "two accumulators" idea fundamentally flawed, or is it possible but just beyond my current ability? - Is the output alignment problem (one EXBY group → multiple main rows) something that can be solved within the current `AggregateExec` architecture, or does it require a new operator? - Are there any similar features or past discussions I can learn from? I've looked at `retract_batch` and `EmitTo` but they seem designed for different use cases. Even a "this won't work because X, go read Y" would be incredibly helpful. I'm here to learn. Sorry for the long post, and thanks for reading. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
