[GitHub] [arrow] coady opened a new issue, #33825: [Python] Introspect partition keys and values in fragments.

via GitHub Sat, 21 Jan 2023 15:04:00 -0800


coady opened a new issue, #33825:
URL: https://github.com/apache/arrow/issues/33825


   ### Describe the enhancement requested
   
   It's not possible to programmatically determine the values of partition keys 
in a fragment. Fragments have a `partition_expression` attribute, but the 
`Expression` type doesn't allow any further introspection. I don't want to have 
to parse the string representation of the expression.
   ```python
   In []: dataset.partitioning.schema
   Out[]: 
   year: int32
   month: int32
   
   In []: fragment = next(dataset.get_fragments())
   
   In []: fragment.partition_expression
   Out[]: <pyarrow.compute.Expression ((year == 2013) and (month == 1))>
   ```
   
   My broader use case is more performant (speed and memory) aggregation of 
partitioned data. Using `pc._group_by` requires loaded arrays, so it ignores 
that the data is already partitioned. And iterating `get_fragments` is crippled 
if one can't identify the fragment.
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] coady opened a new issue, #33825: [Python] Introspect partition keys and values in fragments.

Reply via email to