coady opened a new issue, #33825: URL: https://github.com/apache/arrow/issues/33825
### Describe the enhancement requested It's not possible to programmatically determine the values of partition keys in a fragment. Fragments have a `partition_expression` attribute, but the `Expression` type doesn't allow any further introspection. I don't want to have to parse the string representation of the expression. ```python In []: dataset.partitioning.schema Out[]: year: int32 month: int32 In []: fragment = next(dataset.get_fragments()) In []: fragment.partition_expression Out[]: <pyarrow.compute.Expression ((year == 2013) and (month == 1))> ``` My broader use case is more performant (speed and memory) aggregation of partitioned data. Using `pc._group_by` requires loaded arrays, so it ignores that the data is already partitioned. And iterating `get_fragments` is crippled if one can't identify the fragment. ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
