[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #6800: RFC: Demonstrate new `GroupHashAggregate` stream approach (runs more than 2x faster!)

via GitHub Sat, 08 Jul 2023 08:29:34 -0700


alamb commented on code in PR #6800:
URL: https://github.com/apache/arrow-datafusion/pull/6800#discussion_r1257291689



##########
datafusion/core/src/physical_plan/aggregates/row_hash.rs:
##########
@@ -111,6 +111,8 @@ pub(crate) struct GroupedHashAggregateStream {
     /// first element in the array corresponds to normal accumulators
     /// second element in the array corresponds to row accumulators
     indices: [Vec<Range<usize>>; 2],
+    // buffer to be reused to store hashes
+    hashes_buffer: Vec<u64>,

Review Comment:
    ❤️  this is a good change -- thanks @Dandandan . Pretty soon there will be 
*no* allocations while processing each batch (aka the hot loop) 🥳  -- I think 
with https://github.com/apache/arrow-datafusion/pull/6888 we can get rid of the 
counts in the sum accumulator



##########
datafusion/core/src/physical_plan/aggregates/row_hash.rs:
##########
@@ -111,6 +111,8 @@ pub(crate) struct GroupedHashAggregateStream {
     /// first element in the array corresponds to normal accumulators
     /// second element in the array corresponds to row accumulators
     indices: [Vec<Range<usize>>; 2],
+    // buffer to be reused to store hashes
+    hashes_buffer: Vec<u64>,

Review Comment:
    ❤️  this is a good change -- thanks @Dandandan . Pretty soon there will be 
*no* allocations while processing each batch (aka the hot loop) 🥳  -- I think 
with https://github.com/apache/arrow-datafusion/pull/6888 we can get rid of the 
counts in the sum accumulator



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #6800: RFC: Demonstrate new `GroupHashAggregate` stream approach (runs more than 2x faster!)

Reply via email to