jorgecarleitao commented on a change in pull request #7687:
URL: https://github.com/apache/arrow/pull/7687#discussion_r463200434



##########
File path: rust/datafusion/src/execution/physical_plan/hash_aggregate.rs
##########
@@ -327,120 +278,47 @@ impl RecordBatchReader for GroupedHashAggregateIterator {
                 })
                 .collect::<ArrowResult<Vec<_>>>()?;
 
-            // create vector large enough to hold the grouping key
+            // create vector to hold the grouping key
             let mut key = Vec::with_capacity(group_values.len());
             for _ in 0..group_values.len() {
-                key.push(GroupByScalar::UInt32(0));
+                key.push(KeyScalar::UInt32(0));
             }
 
             // iterate over each row in the batch and create the accumulators 
for each grouping key
-            let mut accumulators: Vec<Rc<AccumulatorSet>> =

Review comment:
       Ok, I was able to run the benchmarks, but I do not obverse any 
statistically significant difference:
   
   ```
   # the commit pre on master:
   git checkout cd503c3f583dab4b94c9934d525664e5897ff06d 
   cargo bench
   ```
   
   leaves me with
   
   ```
   aggregate_query_no_group_by                                                  
                          
                           time:   [121.39 us 121.55 us 121.73 us]
   Found 9 outliers among 100 measurements (9.00%)
     3 (3.00%) high mild
     6 (6.00%) high severe
   
   aggregate_query_group_by                                                     
                       
                           time:   [170.22 us 170.75 us 171.47 us]
   Found 12 outliers among 100 measurements (12.00%)
     7 (7.00%) high mild
     5 (5.00%) high severe
   
   aggregate_query_group_by_with_filter                                         
                                   
                           time:   [279.00 us 279.34 us 279.71 us]
   Found 9 outliers among 100 measurements (9.00%)
     2 (2.00%) high mild
     7 (7.00%) high severe
   ```
   
   followed by 
   
   ```
   # the latest commit on this branch:
   git checkout bbd9da7ce5b582587bce5c8ff8a228f5425e0113
   cargo bench
   ```
   
   leaves me with
   
   ```
   aggregate_query_no_group_by                                                  
                          
                           time:   [122.19 us 122.38 us 122.58 us]
                           change: [-1.5876% +0.3639% +2.3348%] (p = 0.74 > 
0.05)
                           No change in performance detected.
   Found 6 outliers among 100 measurements (6.00%)
     6 (6.00%) high severe
   
   aggregate_query_group_by                                                     
                       
                           time:   [172.66 us 172.91 us 173.19 us]
                           change: [-0.9329% +1.2144% +3.2272%] (p = 0.28 > 
0.05)
                           No change in performance detected.
   Found 12 outliers among 100 measurements (12.00%)
     6 (6.00%) high mild
     6 (6.00%) high severe
   
   aggregate_query_group_by_with_filter                                         
                                   
                           time:   [282.30 us 282.70 us 283.14 us]
                           change: [-1.1013% +0.9700% +2.7683%] (p = 0.40 > 
0.05)
                           No change in performance detected.
   Found 8 outliers among 100 measurements (8.00%)
     2 (2.00%) high mild
     6 (6.00%) high severe
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to