yjshen commented on issue #4973:
URL:
https://github.com/apache/arrow-datafusion/issues/4973#issuecomment-1458077103
> So technically NOTHING would change when this proposal is implemented
I take it differently here; changing one single hash table to per-aggregator
hash table is a fundamental change. After this change, we will theoretically:
1. do more hash table operations (hash, matching, resize&rehash when the
threshold is reached)
2. introduce much more random memory access state update (_no matter whether
the state is the current word-aligned row or changed to using the unified row
in arrow-rs_)
- for a continuous row state in the current approach, we could load
sequential words of it into cache lines, update all fields at once, and proceed
with the next group by key.
- With the per-aggregator state, the cache line will always be filled
with adjacent but irrelevant state data.
If the current inefficiency comes from too much dyn dispatch, I would like
to try JIT.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]