alamb opened a new issue, #6969:
URL: https://github.com/apache/arrow-datafusion/issues/6969

   ### Is your feature request related to a problem or challenge?
   
   This is another observation to make aggregation queries go faster 
   
   After https://github.com/apache/arrow-datafusion/pull/6904 the speed of 
grouping is much faster as much of the per-aggregate state overhead is removed.
   
   Now, for queries with a single primitive type (e.g. Int64) as their grouping 
key is dominated by the creation of the arrow `Row` format: 
https://docs.rs/arrow-row/43.0.0/arrow_row/index.html
   
   An example of such a query is to find the number of interactions with some 
user id:
   
   ```sql
   SELECT count(*)
   FROM t
   GROUP BY user_id
   ```
   
   
   
   ### Describe the solution you'd like
   
   As @Dandandan  suggested on 
https://github.com/apache/arrow-datafusion/pull/6800#issuecomment-1627472688 
   
   > we should also consider avoiding the conversion to the row-format as this 
likely will be one of the more expensive things now.
   
   So that would mean that for group by with a single column, we could use the 
(native) representation of that column to store the group values rather than 
using the Arrow Row format. 
   
   This is the same approach taken today by Sort / SortPreservingMerge. It is 
implemented via a generic `FieldCursor`:
   
   
https://github.com/apache/arrow-datafusion/blob/d316702722e6c301fdb23a9698f7ec415ef548e9/datafusion/core/src/physical_plan/sorts/cursor.rs#L180-L191
   
   ### Describe alternatives you've considered
   
   @yahoNanJing  has also been exploring a fixed width row cursor 
implementation that is optimized for comparison rather than ordering: 
https://github.com/apache/arrow-rs/pull/4524
   
   ### Additional context
   
   There is some discussion in the ASF slack channel as well: 
https://the-asf.slack.com/archives/C01QUFS30TD/p1689301661826909


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to