coady opened a new issue, #33832: URL: https://github.com/apache/arrow/issues/33832
### Describe the enhancement requested Spun off from #33825. `pc._group_by` requires loaded arrays, which means it's not practical to group and aggregate large datasets. Even if the dataset is partitioned, there is no built-in way to aggregate by partition keys. Iterating `get_fragments` is the only option. Another variant would be to aggregate by batch, but that would still require grouping, and in practice would limit the aggregations to simple associative ones, e.g., `sum`. So probably focusing on fragments first is better. ### Component(s) C++, Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
