[GitHub] [arrow] coady opened a new issue, #33832: [C++][Python] Performant aggregating by fragments.

via GitHub Sun, 22 Jan 2023 17:49:42 -0800


coady opened a new issue, #33832:
URL: https://github.com/apache/arrow/issues/33832


   ### Describe the enhancement requested
   
   Spun off from #33825.
   
   `pc._group_by` requires loaded arrays, which means it's not practical to 
group and aggregate large datasets. Even if the dataset is partitioned, there 
is no built-in way to aggregate by partition keys. Iterating `get_fragments` is 
the only option.
   
   Another variant would be to aggregate by batch, but that would still require 
grouping, and in practice would limit the aggregations to simple associative 
ones, e.g., `sum`. So probably focusing on fragments first is better.
   
   
   ### Component(s)
   
   C++, Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] coady opened a new issue, #33832: [C++][Python] Performant aggregating by fragments.

Reply via email to