[GitHub] [arrow-datafusion] yjshen commented on issue #4973: Improve the performance of `Aggregator`, grouping, aggregaton

via GitHub Fri, 03 Mar 2023 20:05:40 -0800


yjshen commented on issue #4973:
URL: 
https://github.com/apache/arrow-datafusion/issues/4973#issuecomment-1454369204


   I'm curious about the proposed new Aggregator API. Do you know if it needs a 
hash table in each aggregator? I'm wondering because I'm a bit concerned about 
memory usage, especially for high cardinality aggregations. 
   
   Suppose keys are duplicated `n` times during execution (where `n` is the 
number of aggregators in the query). In that case, this could potentially lead 
to a significant increase in memory consumption, which might not be acceptable. 
What do you think about this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] yjshen commented on issue #4973: Improve the performance of `Aggregator`, grouping, aggregaton

Reply via email to