sanjibansg commented on PR #13260:
URL: https://github.com/apache/arrow/pull/13260#issuecomment-1141591091

   > > It maybe because of the changed definition of the `Parse()` method. See 
the changed test case here, 
[adb5b00#diff-063096b5f67bfb7437c260bf755259ba4251db3ff7e6aca6ff0cb554181b4461R607](https://github.com/apache/arrow/commit/adb5b00023bf57122c6de540746008eff2d1ad92#diff-063096b5f67bfb7437c260bf755259ba4251db3ff7e6aca6ff0cb554181b4461R607)
 We would need to change the example string for the `Parse()` method 
accordingly.
   > 
   > Applying this diff fixes it locally:
   > 
   > ```
   > $ git diff
   > diff --git a/python/pyarrow/_dataset.pyx b/python/pyarrow/_dataset.pyx
   > index 83c131c..fbde03f 100644
   > --- a/python/pyarrow/_dataset.pyx
   > +++ b/python/pyarrow/_dataset.pyx
   > @@ -1468,7 +1468,7 @@ cdef class 
DirectoryPartitioning(KeyValuePartitioning):
   >      >>> from pyarrow.dataset import DirectoryPartitioning
   >      >>> partitioning = DirectoryPartitioning(
   >      ...     pa.schema([("year", pa.int16()), ("month", pa.int8())]))
   > -    >>> print(partitioning.parse("/2009/11"))
   > +    >>> print(partitioning.parse("/2009/11/"))
   >      ((year == 2009) and (month == 11))
   >      """
   >  
   > @@ -1595,7 +1595,7 @@ cdef class HivePartitioning(KeyValuePartitioning):
   >      >>> from pyarrow.dataset import HivePartitioning
   >      >>> partitioning = HivePartitioning(
   >      ...     pa.schema([("year", pa.int16()), ("month", pa.int8())]))
   > -    >>> print(partitioning.parse("/year=2009/month=11"))
   > +    >>> print(partitioning.parse("/year=2009/month=11/"))
   >      ((year == 2009) and (month == 11))
   >  
   >      """
   > ```
   > 
   > @sanjibansg can you confirm that adding the `/` is required now? we 
probably can apply this minor fix on this PR
   
   Yes, I think it should work now, we need that `/` at the end for the 
`Parse()` method to work in this case. 
   @amol-  This change in `Parse()` method was made to handle different 
Partitioning modes in a better way. Now the `Parse()` method expects the 
complete path, like `"2009/11/1_part.parquet"`. If it's directory partitioning, 
then it needs the string `"2009/11/"`, but if it's filename partitioning, then 
it needs the string `"1_part.parquet"`. So, now the `Parse()` method internally 
extracts the required part of the string depending on the partitioning mode.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to