[GitHub] [arrow-datafusion] alamb commented on pull request #1520: use bumpalo for GroupState

GitBox Wed, 05 Jan 2022 04:14:45 -0800


alamb commented on pull request #1520:
URL: 
https://github.com/apache/arrow-datafusion/pull/1520#issuecomment-1005634682



   It is fascinating that calling the `Drop` function for `GroupState` consumes 
so much time in your profile. 
   
   ```rust
   /// The state that is built for each output group.
   #[derive(Debug)]
   struct GroupState {
       /// The actual group by values, one for each group column
       group_by_values: Box<[ScalarValue]>,
   
       // Accumulator state, one for each aggregate
       accumulator_set: Vec<AccumulatorItem>,
   
       /// scratch space used to collect indices for input rows in a
       /// bach that have values to aggregate. Reset on each batch
       indices: Vec<u32>,
   }
   ```
   
   One way you could confirm it is the actual time required to call `Drop` is 
using code like this to temporarily skip the drops and see if it goes faster:
   
   ```rust
   impl Drop for GroupState {
       fn drop(&mut self) {
           // Test out skipping running `drop` on the different fields
           // to confirm calling their `Drop` is taking a long time
   
           // Note this LEAKS memory!
   
           let t = std::mem::replace(&mut self.group_by_values, Box::new([]));
           std::mem::forget(t);
   
           let t = std::mem::replace(&mut self.accumulator_set, vec![]);
           std::mem::forget(t);
   
           let t = std::mem::replace(&mut self.indices, vec![]);
           std::mem::forget(t);
       }
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] alamb commented on pull request #1520: use bumpalo for GroupState

Reply via email to