2010YOUY01 commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4742353436
> > Marking as a draft as I don't think this one is ready to merge quite yet and I am trying to clean up the review / merge queue > > Yes, and I think the whole feature will be suitable to push forward after the aggregation refactoring stable. > > Howerver, actually to parts are included in this: > > * One part is about refactoring `GroupValues` and `GroupAccumulator` > * The other part is about applying the blocked logic in aggreagating > > How about we split this pr into twos or mores? And push forward the part one (`GroupValues` and `GroupAccumulator`) in parallel with the aggregation refactoring? @alamb @2010YOUY01 @ariel-miculas I think the steps are 1. Complete https://github.com/apache/datafusion/issues/22710 2. Initial PR for blocked states: The major issue is to agree on the API changes for `GroupValues` and `GroupsAccumulator`, and how to organize future works. 3. Update all `GroupValues` and `GroupsAccumulator` (There are around 20 of them IIRC) The performance seems to be a nearly solved issue, the PoC already showed high cardinality cases are faster (with several micro optimizations left on the table), low cardinality is slightly slower but @alamb's suggestion in https://github.com/apache/datafusion/pull/22712#issuecomment-4672476038 is doable I think, to bring back the performance. I suggest not trying to parallelize steps 1 and 2, as they will likely conflict with each other. Step 3 should be highly parallelizable. As for the refactoring progress, I'd estimate it's about 50% complete. I haven't seen any major technical blockers so far—just need some time to better structure the implementation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
