tustvold commented on issue #7191: URL: https://github.com/apache/arrow-datafusion/issues/7191#issuecomment-1665729387
> I needed a BiMap Yeah, I think you need both a priority queue to work out which groups to keep, along with a HashMap to work out which rows belong to which groups. > I don't think we'd want to always evict groups, because we might not even need to add them in the first place if the value being aggregated is less/greater than the min/max of the priority queue - so it would be a no-op. I was envisaging something like adding support to the `GroupsAccumulator::inter` to optionally return a list of groups to redact, possibly as a `BooleanBuffer`. This would effectively be groups from previous calls to `GroupsAccumulator::inter` that are no longer needed. This would then be fed to a new `GroupValues::evict` method to clear them out from the various aggregators, possibly using something relatively cheap like [`Vec::retain`](https://doc.rust-lang.org/std/vec/struct.Vec.html#method.retain). Or something to that effect, just spitballing here. I really want to get Window functions using `GroupsAccumulator` so that we can get rid of the old scalar accumulators (#7112) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
