ic4y commented on pull request #1520:
URL:
https://github.com/apache/arrow-datafusion/pull/1520#issuecomment-1004941980
From
```rust
struct Accumulators {
map: RawTable<(u64, usize)>,
group_states: Vec<GroupState>,
}
```
To
```rust
struct Accumulators {
map: RawTable<(u64, usize)>,
group_states:BumpVec<GroupState>,
}
```
By using bumpalo to allocate memory for group_states, the time to destruct
group_states can be greatly reduced in the case of high cardinality, and the
time consumption of destructuring group_states is almost not counted in pprf
The total test data is 350 million, and the deduplication number of user_id
is 50 million。
`sql : select count(1) from (select user_id from event group by user_id)a`
**master:**
drop_in_place<GroupState> takes 6s(50%) ,total 14s

**bumpalo:**
drop_in_place<GroupState> takes 0s(not counted) ,total 8s(40%
increase)

Under the TPC-H benchmark test, there is almost no difference. I think the
reason is that the grouping base is not high enough.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]