Rachelint commented on code in PR #11943: URL: https://github.com/apache/datafusion/pull/11943#discussion_r1720011628
########## datafusion/expr-common/src/groups_accumulator.rs: ########## @@ -31,6 +31,13 @@ pub enum EmitTo { /// For example, if `n=10`, group_index `0, 1, ... 9` are emitted /// and group indexes '`10, 11, 12, ...` become `0, 1, 2, ...`. First(usize), + /// Emit all groups managed by blocks + AllBlocks, + /// Emit only the first `n` group blocks, + /// similar as `First`, but used in blocked `GroupValues` and `GroupAccumulator`. + /// + /// For example, `n=3`, `block size=4`, finally 12 groups will be returned. + FirstBlocks(usize), Review Comment: > > use the iterator approach to impl AllBlocks and FirstBlocks defined now > > Not sure how does this work, but it looks like a neat idea. If we apply the same idea to "element" (First and All), and consider it as a specialized case with block_size = 1, I think we could end up a pretty nice abstraction. Probably we just need `EmitTo::Block(block_size)` 🤔 However, it is too far way from now. 😆 🤔 Yes, other emit mode can indeed seen as a case with specialized blocke size in the iterator approach. But considered about performance, it is better to let `batch_size == block_size`. After introduce the iterator approach, just 200+ codes to finished the sketch, compared to the stale version sketch with 600+. The main work is just to add a stream state `ExecutionState::ProducingBlocks(blocks)` . https://github.com/Rachelint/arrow-datafusion/blob/d79d912d1677549c825cafc405911973ace0df46/datafusion/physical-plan/src/aggregates/row_hash.rs#L728 Maybe it can show how the blocked optimzation works. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org