[GitHub] [arrow-datafusion] ozankabak commented on pull request #6034: Implement Streaming Aggregation: Do not break pipeline in aggregation if group by columns are ordered

via GitHub Tue, 25 Apr 2023 16:40:49 -0700


ozankabak commented on PR #6034:
URL: 
https://github.com/apache/arrow-datafusion/pull/6034#issuecomment-1522548414


   > I'm not very familiar with the group by code anymore, but the use of 
per-group allocations does immediately stand out to me as at risk of thrashing 
the memory allocator, and consequently making the code not just slow but also 
wildly unpredictable. I'm aware this isn't something introduced by this PR, but 
revisiting this design (see 
https://github.com/apache/arrow-datafusion/issues/4973) may make it easier to 
make changes in this area without introducing regressions.
   
   I agree, we will be happy to take part in that effort.
   
   Going back to the scope of this PR: If everyone agrees, we will generate 
another changeset with Approach 2, and @alamb can check how the performance 
looks there (since our local benchmarks do not show a difference even now).
   
   We can then move forward with either that, or this version. Sounds good?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] ozankabak commented on pull request #6034: Implement Streaming Aggregation: Do not break pipeline in aggregation if group by columns are ordered

Reply via email to