tustvold commented on issue #7191:
URL: 
https://github.com/apache/arrow-datafusion/issues/7191#issuecomment-1665729387

   >  I needed a BiMap
   
   Yeah, I think you need both a priority queue to work out which groups to 
keep, along with a HashMap to work out which rows belong to which groups.
   
   > I don't think we'd want to always evict groups, because we might not even 
need to add them in the first place if the value being aggregated is 
less/greater than the min/max of the priority queue - so it would be a no-op.
   
   I was envisaging something like adding support to the 
`GroupsAccumulator::inter` to optionally return a list of groups to redact, 
possibly as a `BooleanBuffer`. This would effectively be groups from previous 
calls to `GroupsAccumulator::inter` that are no longer needed. This would then 
be fed to a new `GroupValues::evict` method to clear them out from the various 
aggregators, possibly using something relatively cheap like 
[`Vec::retain`](https://doc.rust-lang.org/std/vec/struct.Vec.html#method.retain).
   
   Or something to that effect, just spitballing here. I really want to get 
Window functions using `GroupsAccumulator` so that we can get rid of the old 
scalar accumulators (#7112)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to