Dandandan opened a new issue #338: URL: https://github.com/apache/arrow-datafusion/issues/338
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** Currently `.to_array()` is called on each scalar value which is slow and generates a lot of allocations. This causes two things: * There is overhead for generating arrays in this way. * The single-row arrays are concatenated afterwards at the end, which is slow and would be unnecessary if they are * Intermediate `Vecs` are generated, causing more memory usage / allocations / fragmentation. I expect this should speed up some db-benchmark queries (group by queries with smaller groups) considerably and may decrease memory usage by quite a bit. **Describe the solution you'd like** Iterate over the values and emit arrays of `batch_size` elements at once. Or as a first step just do it for all of the values (as is the case currently) - and emit smaller batches in a later PR. To do it with `batch_size` there should be some state and/or remove the groups from the map. **Describe alternatives you've considered** n/a **Additional context** n/a -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
