tustvold commented on issue #4973: URL: https://github.com/apache/arrow-datafusion/issues/4973#issuecomment-1608940763
> The per-group type dispatching overhead is not the major bottleneck I agree 100%, the issue is the per-group dispatching and state-management overheads. This is what destroys performance for high-cardinality grouping and is what the proposal aims to address. The fact the proposal also eliminates the type-dispatching overheads is a happy accident. > If we let each Accumulator manages its own state, I guess there will be more random memory access. The output of the accumulators is a columnar representation per aggregator. The current approach uses a row format to store intermediate state, converting back to this columnar representation on flush. It is unclear to me why this would be advantageous, if anything it seems like it would lead to more random memory accesses. > The random memory access and mem stall should be the major bottleneck in the case of high cardinality grouping Assuming relatively cheap aggregators like count or max, I personally would expect the hash table logic to be the bottleneck, but perhaps that is what you are alluding to here. The cost of updating the aggregators once the groups have been identified should be a rounding error if done properly in a columnar fashion. I suspect rather than having discussions about theoretical performance characteristics, it is probably worth prototyping and collecting some empirical data. I can try to find some time in the coming weeks. Whilst I expect good things, even if the performance is only an incremental improvement, I think the proposal provides a huge improvement to code quality and maintainability which I personally think is extremely important. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
