alamb opened a new issue, #2723: URL: https://github.com/apache/arrow-datafusion/issues/2723
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** As part of the transition to the faster "row format" (#1861 ) , @yjshen implemented a Row based Hash Aggregate implementation in https://github.com/apache/arrow-datafusion/pull/2375 ❤️ However, the implementation currently implements support for a subset of the data types that DataFusion supports. This made the code significantly faster for some cases but has some downsides: 1. Not all data types benefit from the row format performance 2. There are two parallel similar but not the same implementations of hash aggregate -- [`row_hash.rs`](https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/src/physical_plan/aggregates/row_hash.rs) and [`hash.rs`](https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/src/physical_plan/aggregates/hash.rs) You can already see the potential challenge in PRs like https://github.com/apache/arrow-datafusion/pull/2716 where test coverage may miss one of the hash aggregate implementations by accident **Describe the solution you'd like** I would like to consolidate the hash aggregate implementations -- success is to delete `hash.rs` by adding the additional remaining type support to `row_hash.rs` I think this would be a nice project for someone new to DataFusion to work on as the pattern is already defined, the outcome will be better performance, and they will get good experience with the code. It will also increase the type support for row format and make it easier to roll out through the rest of the codebase **Describe alternatives you've considered** N/A **Additional context** More context about the ongoing row format conversion is https://github.com/apache/arrow-datafusion/issues/1861 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
