alamb commented on pull request #1520:
URL:
https://github.com/apache/arrow-datafusion/pull/1520#issuecomment-1005634682
It is fascinating that calling the `Drop` function for `GroupState` consumes
so much time in your profile.
```rust
/// The state that is built for each output group.
#[derive(Debug)]
struct GroupState {
/// The actual group by values, one for each group column
group_by_values: Box<[ScalarValue]>,
// Accumulator state, one for each aggregate
accumulator_set: Vec<AccumulatorItem>,
/// scratch space used to collect indices for input rows in a
/// bach that have values to aggregate. Reset on each batch
indices: Vec<u32>,
}
```
One way you could confirm it is the actual time required to call `Drop` is
using code like this to temporarily skip the drops and see if it goes faster:
```rust
impl Drop for GroupState {
fn drop(&mut self) {
// Test out skipping running `drop` on the different fields
// to confirm calling their `Drop` is taking a long time
// Note this LEAKS memory!
let t = std::mem::replace(&mut self.group_by_values, Box::new([]));
std::mem::forget(t);
let t = std::mem::replace(&mut self.accumulator_set, vec![]);
std::mem::forget(t);
let t = std::mem::replace(&mut self.indices, vec![]);
std::mem::forget(t);
}
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]