wjones127 commented on code in PR #14646:
URL: https://github.com/apache/arrow/pull/14646#discussion_r1024364550


##########
python/pyarrow/tests/test_dataset.py:
##########
@@ -4912,3 +4912,33 @@ def test_read_table_nested_columns(tempdir, format):
         {'user_id': 'qrs456', 'type': 'scroll', 'values': [None, 3, 4],
          'structs': [{'fizz': 'buzz', 'foo': None}], 'a.dotted.field': 2}
     ]
+
+
+def test_dataset_partition_with_slash(tmpdir):
+    from pyarrow import dataset as ds
+
+    path = tmpdir / "slash-writer-x"
+
+    dt_table = pa.Table.from_arrays([
+        pa.array([1, 2, 3, 4, 5], pa.int32()),
+        pa.array(["experiment/A/f.csv", "experiment/B/f.csv",
+                  "experiment/A/f.csv", "experiment/C/k.csv",
+                  "experiment/M/i.csv"], pa.utf8())], ["exp_id", "exp_meta"])
+
+    ds.write_dataset(
+        data=dt_table,
+        base_dir=path,
+        format='parquet',
+        partitioning=['exp_meta'],
+        partitioning_flavor='hive',
+    )
+
+    read_table = ds.dataset(
+        source=path,
+        format='parquet',
+        partitioning='hive',
+        schema=pa.schema([pa.field("exp_id", pa.int32()),
+                         pa.field("exp_meta", pa.utf8())])
+    ).to_table().combine_chunks()
+

Review Comment:
   Could we also assert what the escaped partition directories are named? Given 
we are trying to be compatible with other systems, it seems like it would be 
wise to enforce what the Uri-encoded form is, rather than just asserting we can 
roundtrip.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to