jorisvandenbossche commented on pull request #7691: URL: https://github.com/apache/arrow/pull/7691#issuecomment-656143156
Not the cleanest solution, but could do this relatively quickly because it's based on what I did earlier in https://github.com/apache/arrow/pull/7523. But I think a more proper solution won't be possible before 1.0, and this at least gives a way to get the information needed. A few examples: ```python In [1]: import pyarrow.dataset as ds In [2]: dataset = ds.dataset("test_filter_fragments_pandas/", format="parquet", partitioning="hive") In [4]: expr = list(dataset.get_fragments())[0].partition_expression # single partition level with a string In [5]: expr Out[5]: <pyarrow.dataset.Expression (part == A:string)> In [6]: ds._unwrap_partition_expression(expr) Out[6]: [('part', 'A')] In [7]: dataset = ds.dataset("test_parquet_dask/", format="parquet", partitioning="hive") In [8]: expr = list(dataset.get_fragments())[0].partition_expression # two partition levels with integers In [9]: expr Out[9]: <pyarrow.dataset.Expression ((year == 2016:int32) and (month == 1:int32))> In [10]: ds._unwrap_partition_expression(expr) Out[10]: [('year', 2016), ('month', 1)] In [11]: dataset = ds.dataset("test.parquet", format="parquet") In [12]: expr = list(dataset.get_fragments())[0].partition_expression # no partitioned dataset In [13]: expr Out[13]: <pyarrow.dataset.Expression true:bool> In [14]: ds._unwrap_partition_expression(expr) Out[14]: [] ``` cc @rjzamora ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
