[GitHub] [arrow-datafusion] alamb commented on pull request #6800: RFC: Demonstrate new `GroupHashAggregate` stream approach (runs more than 2x faster!)

via GitHub Sun, 02 Jul 2023 13:32:48 -0700


alamb commented on PR #6800:
URL: 
https://github.com/apache/arrow-datafusion/pull/6800#issuecomment-1616810203


   >  @alamb do you continue this PR on your own or would some form of 
assistance help? E.g. writing some of those accumulators?
   
   Thank you @Dandandan ! my plan is for this PR is to
   1.   "complete" the avg accumulator -- I need to handle NULLs properly and I 
need to implement filtering.
   2. Implement the `GroupsAccumulator`  for a `dyn Accumulator` 
   
   It would be really helpful if you could then implement the other 
`RowAccumulators` (which is the minimum needed to avoid a performance 
regression I think). 
   
   Once those pieces are in place I think we can switch out group hash 
implementation.
   
   In my ideal world we would also unify the streaming group duplication too 
(https://github.com/apache/arrow-datafusion/issues/6798) but I haven't spent 
enough time studying that code to figure out how best to abstract it out -- I 
think it is totally doable, it just needs some study
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] alamb commented on pull request #6800: RFC: Demonstrate new `GroupHashAggregate` stream approach (runs more than 2x faster!)

Reply via email to