Rachelint commented on PR #11758: URL: https://github.com/apache/datafusion/pull/11758#issuecomment-2282832444
> > I agree, finally it should be a big change which switches the group values and related states managed by block like duckdb , and I am working on this(#11931). > > I think personally suggest sketching out what this would look like in a first PR without worrying about getting all the tests passing / compiling etc. > > If we try to port all code at once to being managed in blocks it is going to be a very large change > > I am thinking maybe we can have a incremental approach (like for example separately adding the ability to do blocked emission for https://docs.rs/datafusion/latest/datafusion/logical_expr/trait.GroupsAccumulator.html and > > https://github.com/apache/datafusion/blob/fd237f8705b18fa089fdfb8dd5b04655ccb4d691/datafusion/physical-plan/src/aggregates/group_values/mod.rs#L37-L55 > > If we can set this up so that we get the pattern and a few common implementations setup in the first PR then we can make subsequent PRs to port over the other parts of the aggregation 🤔 Yes, I want to do the similar things for > incremental approach Yes, I am planning to make it incrementally, too. For the `GroupValues`s, I make the compatibility work here to make it able to process the single big block or the multiple small blocks. And I am just trying to impl the block based managing for `GroupValuesRows` impls now, other impls will keep the old logic until we optimize them. https://github.com/apache/datafusion/blob/56399f7bd87c70b94be8b39d38fb5484289d3915/datafusion/physical-plan/src/aggregates/row_hash.rs#L654-L683 For the `GroupAccumulator`s, I impl a function to convert the `new block based group idx` to the `old flat group idx`, so the `GroupAccumulator`s can almost keep the old process logic. Based on this, we can swith them to the new style incrementally. https://github.com/apache/datafusion/blob/56399f7bd87c70b94be8b39d38fb5484289d3915/datafusion/physical-plan/src/aggregates/group_values/mod.rs#L60 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
