alamb opened a new issue, #2723:
URL: https://github.com/apache/arrow-datafusion/issues/2723

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   As part of the transition to the faster "row format" (#1861 ) , @yjshen  
implemented a Row based Hash Aggregate implementation in 
https://github.com/apache/arrow-datafusion/pull/2375 ❤️ 
   
   However, the implementation currently implements support for a subset of the 
data types that DataFusion supports. This made the code significantly faster 
for some cases but has some downsides:
   
   1. Not all data types benefit from the row format performance
   2. There are two parallel similar but not the same implementations of hash 
aggregate -- 
[`row_hash.rs`](https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/src/physical_plan/aggregates/row_hash.rs)
 and 
[`hash.rs`](https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/src/physical_plan/aggregates/hash.rs)
   
   You can already see the potential challenge in PRs like 
https://github.com/apache/arrow-datafusion/pull/2716 where test coverage may 
miss one of the hash aggregate implementations by accident
   
   **Describe the solution you'd like**
   I would like to consolidate the hash aggregate implementations -- success is 
to delete `hash.rs` by adding the additional remaining type support to 
`row_hash.rs` 
   
   I think this would be a nice project for someone new to DataFusion to work 
on as the pattern is already defined, the outcome will be better performance, 
and they will get good experience with the code. 
   
   It will also increase the type support for row format and make it easier to 
roll out through the rest of the codebase
   
   **Describe alternatives you've considered**
   N/A 
   
   **Additional context**
   More context about the ongoing row format conversion is 
https://github.com/apache/arrow-datafusion/issues/1861


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to