amoeba commented on code in PR #43740:
URL: https://github.com/apache/arrow/pull/43740#discussion_r1744706643
##########
python/pyarrow/_dataset.pyx:
##########
@@ -2505,6 +2505,43 @@ cdef class Partitioning(_Weakrefable):
result = self.partitioning.Parse(tobytes(path))
return Expression.wrap(GetResultValue(result))
+ def format(self, expr):
+ """
+ Convert a filter expression into a tuple of (directory, filename)
using
+ the current partitioning scheme
+
+ Parameters
+ ----------
+ expr : pyarrow.dataset.Expression
+
+ Returns
+ -------
+ tuple[str, str]
+
+ Examples
+ --------
+
+ Specify the Schema for paths like "/2009/June":
+
+ >>> import pyarrow as pa
+ >>> import pyarrow.dataset as ds
+ >>> import pyarrow.compute as pc
+ >>> part = ds.partitioning(pa.schema([("year", pa.int16()),
+ ... ("month", pa.string())]))
+ >>> part.format(
+ ... (pc.field("year") == 1862) & (pc.field("month") == "Jan")
+ ... )
+ """
+ cdef:
+ CResult[CPartitionPathFormat] result
+ CPartitionPathFormat result_value
+ result = self.partitioning.Format(
+ Expression.unwrap(expr)
+ )
+ result_value = GetResultValue(result)
+
+ return frombytes(result_value.directory),
frombytes(result_value.filename)
Review Comment:
```suggestion
cdef:
CPartitionPathFormat result
result = GetResultValue(self.partitioning.Format(
Expression.unwrap(expr)
))
return frombytes(result.directory), frombytes(result.filename)
```
##########
python/pyarrow/tests/parquet/test_dataset.py:
##########
@@ -1216,7 +1222,6 @@ def test_read_table_duplicate_column_selection(tempdir):
def test_dataset_partitioning(tempdir):
- import pyarrow.dataset as ds
Review Comment:
I don't think this and a few other changes are necessary here, can you
revert just those that are formatting-related?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]