sundy-li commented on issue #846:
URL: 
https://github.com/apache/arrow-datafusion/issues/846#issuecomment-895763475


   > With grouping the values in one value I am wondering whether it's good 
enough for the hashtable? Or would you hash that again?
   
   We don't care about the rehash in hashmap, it's the problem of hashmap.
    We just ensure the Key is unique (Fixed keys are represented as Number,  
String key can use hash256 method, don't need to care about the key conflict 
because it's as safe as crack the bank's password ).
    
    With the specified key, we can get unique `AggregateFunctionState` from the 
HashMap,  then we calculate/merge this row to the state.  So the block is not 
modified by `take`, we just modified the state, and only need one `function or 
expr`  for each aggregate function.
    
    Refer to clickhouse design:
    
    
https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/Aggregator.h#L580-L625
    
    Why ClickHouse group-by is very faster?
   
    https://bohutang.me/2021/01/21/clickhouse-and-friends-groupby/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to