[GitHub] [arrow-datafusion] yjshen commented on issue #4973: Improve the performance of `Aggregator`, grouping, aggregaton

via GitHub Tue, 07 Mar 2023 04:22:00 -0800


yjshen commented on issue #4973:
URL: 
https://github.com/apache/arrow-datafusion/issues/4973#issuecomment-1458077103


   > So technically NOTHING would change when this proposal is implemented
   
   I take it differently here; changing one single hash table to per-aggregator 
hash table is a fundamental change. After this change, we will theoretically:
   1. do more hash table operations (hash, matching, resize&rehash when the 
threshold is reached) 
   2. introduce much more random memory access state update (_no matter whether 
the state is the current word-aligned row or changed to using the unified row 
in arrow-rs_)
      - for a continuous row state in the current approach, we could load 
sequential words of it into cache lines, update all fields at once, and proceed 
with the next group by key. 
      - With the per-aggregator state, the cache line will always be filled 
with adjacent but irrelevant state data.
   
   If the current inefficiency comes from too much dyn dispatch, I would like 
to try JIT.
       


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] yjshen commented on issue #4973: Improve the performance of `Aggregator`, grouping, aggregaton

Reply via email to