[GitHub] [arrow] amol- commented on a change in pull request #11008: ARROW-13755: [Python] Allow writing datasets using a partitioning that only specifies field_names

GitBox Thu, 16 Sep 2021 01:27:20 -0700


amol- commented on a change in pull request #11008:
URL: https://github.com/apache/arrow/pull/11008#discussion_r709899978




##########
File path: python/pyarrow/dataset.py
##########
@@ -714,9 +729,12 @@ def write_dataset(data, base_dir, basename_template=None, 
format=None,
         and `format` is not specified, it defaults to the same format as the
         specified FileSystemDataset. When writing a Table or RecordBatch, this
         keyword is required.
-    partitioning : Partitioning, optional
+    partitioning : Partitioning or list[str], optional
         The partitioning scheme specified with the ``partitioning()``
-        function.
+        function or as a list of field names.
+    partitioning_flavor : str, optional

Review comment:
       Default behaviour is equal to providing `partitioning(pa.schema([])` 
(haven't changed this).
   I would gladly document it, but I'm unsure about what a partitioning with an 
empty schema means.
   
   From what I can see it works by creating files without any directory. What 
I'm not sure about is if the data will ever be split in multiple files or if we 
will always only save `part-0.parquet` because without a partitioning column no 
chunks will exist.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] amol- commented on a change in pull request #11008: ARROW-13755: [Python] Allow writing datasets using a partitioning that only specifies field_names

Reply via email to