[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #11008: ARROW-13755: [Python] Allow writing datasets using a partitioning that only specifies field_names

GitBox Thu, 16 Sep 2021 01:50:06 -0700


jorisvandenbossche commented on a change in pull request #11008:
URL: https://github.com/apache/arrow/pull/11008#discussion_r709917673




##########
File path: python/pyarrow/dataset.py
##########
@@ -678,17 +678,32 @@ def dataset(source, schema=None, format=None, 
filesystem=None,
         )
 
 
-def _ensure_write_partitioning(scheme):
-    if scheme is None:
-        scheme = partitioning(pa.schema([]))
-    if not isinstance(scheme, Partitioning):
-        # TODO support passing field names, and get types from schema
-        raise ValueError("partitioning needs to be actual Partitioning object")
-    return scheme
+def _ensure_write_partitioning(part, schema, flavor):
+    if isinstance(part, Partitioning) and flavor:
+        raise ValueError(
+            "Providing a partitioning_flavor with "
+            "a Partitioning object is not supported"
+        )
+    elif isinstance(part, (tuple, list)):
+        # Name of fields were provided instead of a partitioning object.
+        # Create a partitioning factory with those field names.
+        part = partitioning(
+            schema=pa.schema([schema.field(f) for f in part]),
+            flavor=flavor
+        )
+    elif part is None:
+        part = partitioning(pa.schema([]), flavor=flavor)

Review comment:
       I don't think adding the flavor matters here, but probably also can't 
hurt (IIRC this line was needed because the C++ method required an empty 
partitioning instead of just a null pointer, this might have changed though)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #11008: ARROW-13755: [Python] Allow writing datasets using a partitioning that only specifies field_names

Reply via email to