Dandandan edited a comment on pull request #8765: URL: https://github.com/apache/arrow/pull/8765#issuecomment-733824217
@jorgecarleitao Not really on performance as current benchmarks / queries show, just looking at ways to improve the aggregate / join performance. The main thing I wanted to investigate is whether the aggregates / join can be made faster itself. I think one part would be to create a key that can be hashed faster. Now the hashing algorithm hashes each value individual GroupByValue instead of working on a byte array. The latter one could in principle be faster. Some specialized code could also be made for hashing based on 1 column only. It can have a larger impact on _**memory usage**_ though if you are hashing / aggregating something with high cardinality as each key will generate 10s of extra bytes based on 16 bytes for each GroupByValue, 8 bytes for using `Vec` and 8 bytes for boxing the inner Vec of the aggregation. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org