sundy-li commented on issue #846:
URL:
https://github.com/apache/arrow-datafusion/issues/846#issuecomment-895763475
> With grouping the values in one value I am wondering whether it's good
enough for the hashtable? Or would you hash that again?
We don't care about the rehash in hashmap, it's the problem of hashmap.
We just ensure the Key is unique (Fixed keys are represented as Number,
String key can use hash256 method, don't need to care about the key conflict
because it's as safe as crack the bank's password ).
With the specified key, we can get unique `AggregateFunctionState` from the
HashMap, then we calculate/merge this row to the state. So the block is not
modified by `take`, we just modified the state, and only need one `function or
expr` for each aggregate function.
Refer to clickhouse design:
https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/Aggregator.h#L580-L625
Why ClickHouse group-by is very faster?
https://bohutang.me/2021/01/21/clickhouse-and-friends-groupby/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]