sanjibansg commented on PR #13260: URL: https://github.com/apache/arrow/pull/13260#issuecomment-1141591091
> > It maybe because of the changed definition of the `Parse()` method. See the changed test case here, [adb5b00#diff-063096b5f67bfb7437c260bf755259ba4251db3ff7e6aca6ff0cb554181b4461R607](https://github.com/apache/arrow/commit/adb5b00023bf57122c6de540746008eff2d1ad92#diff-063096b5f67bfb7437c260bf755259ba4251db3ff7e6aca6ff0cb554181b4461R607) We would need to change the example string for the `Parse()` method accordingly. > > Applying this diff fixes it locally: > > ``` > $ git diff > diff --git a/python/pyarrow/_dataset.pyx b/python/pyarrow/_dataset.pyx > index 83c131c..fbde03f 100644 > --- a/python/pyarrow/_dataset.pyx > +++ b/python/pyarrow/_dataset.pyx > @@ -1468,7 +1468,7 @@ cdef class DirectoryPartitioning(KeyValuePartitioning): > >>> from pyarrow.dataset import DirectoryPartitioning > >>> partitioning = DirectoryPartitioning( > ... pa.schema([("year", pa.int16()), ("month", pa.int8())])) > - >>> print(partitioning.parse("/2009/11")) > + >>> print(partitioning.parse("/2009/11/")) > ((year == 2009) and (month == 11)) > """ > > @@ -1595,7 +1595,7 @@ cdef class HivePartitioning(KeyValuePartitioning): > >>> from pyarrow.dataset import HivePartitioning > >>> partitioning = HivePartitioning( > ... pa.schema([("year", pa.int16()), ("month", pa.int8())])) > - >>> print(partitioning.parse("/year=2009/month=11")) > + >>> print(partitioning.parse("/year=2009/month=11/")) > ((year == 2009) and (month == 11)) > > """ > ``` > > @sanjibansg can you confirm that adding the `/` is required now? we probably can apply this minor fix on this PR Yes, I think it should work now, we need that `/` at the end for the `Parse()` method to work in this case. @amol- This change in `Parse()` method was made to handle different Partitioning modes in a better way. Now the `Parse()` method expects the complete path, like `"2009/11/1_part.parquet"`. If it's directory partitioning, then it needs the string `"2009/11/"`, but if it's filename partitioning, then it needs the string `"1_part.parquet"`. So, now the `Parse()` method internally extracts the required part of the string depending on the partitioning mode. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org