2010YOUY01 commented on PR #15591:
URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4742353436

   > > Marking as a draft as I don't think this one is ready to merge quite yet 
and I am trying to clean up the review / merge queue
   > 
   > Yes, and I think the whole feature will be suitable to push forward after 
the aggregation refactoring stable.
   > 
   > Howerver, actually to parts are included in this:
   > 
   > * One part is about refactoring `GroupValues` and `GroupAccumulator`
   > * The other part is about applying the blocked logic in aggreagating
   > 
   > How about we split this pr into twos or mores? And push forward the part 
one (`GroupValues` and `GroupAccumulator`) in parallel with the aggregation 
refactoring? @alamb @2010YOUY01 @ariel-miculas
   
   I think the steps are
   1. Complete https://github.com/apache/datafusion/issues/22710
   2. Initial PR for blocked states: The major issue is to agree on the API 
changes for `GroupValues` and `GroupsAccumulator`, and how to organize future 
works.
   3. Update all `GroupValues` and `GroupsAccumulator` (There are around 20 of 
them IIRC)
   
   The performance seems to be a nearly solved issue, the PoC already showed 
high cardinality cases are faster (with several micro optimizations left on the 
table), low cardinality is slightly slower but @alamb's suggestion in 
https://github.com/apache/datafusion/pull/22712#issuecomment-4672476038 is 
doable I think, to bring back the performance.
   
   I suggest not trying to parallelize steps 1 and 2, as they will likely 
conflict with each other. Step 3 should be highly parallelizable.
   
   As for the refactoring progress, I'd estimate it's about 50% complete. I 
haven't seen any major technical blockers so far—just need some time to better 
structure the implementation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to