JasonLi-cn commented on PR #11758: URL: https://github.com/apache/datafusion/pull/11758#issuecomment-2283323779
> > > I agree, finally it should be a big change which switches the group values and related states managed by block like duckdb , and I am working on this(#11931). > > > > > > I think personally suggest sketching out what this would look like in a first PR without worrying about getting all the tests passing / compiling etc. > > If we try to port all code at once to being managed in blocks it is going to be a very large change > > I am thinking maybe we can have a incremental approach (like for example separately adding the ability to do blocked emission for https://docs.rs/datafusion/latest/datafusion/logical_expr/trait.GroupsAccumulator.html and > > https://github.com/apache/datafusion/blob/fd237f8705b18fa089fdfb8dd5b04655ccb4d691/datafusion/physical-plan/src/aggregates/group_values/mod.rs#L37-L55 > > > > If we can set this up so that we get the pattern and a few common implementations setup in the first PR then we can make subsequent PRs to port over the other parts of the aggregation 🤔 > > Yes, I want to do the similar things for > > > incremental approach > > Yes... I try to do something like it but still not thorough enough, and I found it is actually hard to support the exact `EmitTo::First` used by streaming aggr after I found the reason why some tests failed... And we actually should disable the blocked mode in streaming aggr. > > I am planning to switch to the special blocked emission impl now... It seems we can do two things to introduce the new emit modes? > > * Define a `support_blocked_emission` for `GroupValues` and `GroupAccumulator`. > * Add two emit enums > > ``` > pub enum EmitTo { > /// Emit all groups > All, > /// Emit only the first `n` groups and shift all existing group > /// indexes down by `n`. > /// > /// For example, if `n=10`, group_index `0, 1, ... 9` are emitted > /// and group indexes '`10, 11, 12, ...` become `0, 1, 2, ...`. > First(usize), > > AllBlocks, > > FirstBlocks(usize), > } > ``` > > What do you think this way to supprot the special blocked emission? Do we need to put `batch_size` in `AllBlocks` and `FirstBlocks` 🤔 ? And that's exactly what I was planning on doing. But I found that the changes to the code were very large. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
