[GitHub] [arrow] vibhatha commented on a change in pull request #11911: ARROW-15019: [Python] Add bindings for new dataset writing options

GitBox Wed, 15 Dec 2021 18:06:10 -0800


vibhatha commented on a change in pull request #11911:
URL: https://github.com/apache/arrow/pull/11911#discussion_r770171487




##########
File path: python/pyarrow/dataset.py
##########
@@ -798,6 +800,20 @@ def write_dataset(data, base_dir, basename_template=None, 
format=None,
         used determined by the number of available CPU cores.
     max_partitions : int, default 1024
         Maximum number of partitions any batch may be written into.
+    max_open_files : int, default 1024
+        Maximum number of number of files can be opened
+    max_rows_per_file : int, default 0
+        Maximum number of rows per file
+    min_rows_per_group : int, default 0
+        Minimum number of rows per group. When the value is greater than 0,
+        the dataset writer will batch incoming data and only write the row
+        groups to the disk when sufficient rows have accumulated.
+    max_rows_per_group : int, default 1 << 20

Review comment:
       I agree, it is more readable. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] vibhatha commented on a change in pull request #11911: ARROW-15019: [Python] Add bindings for new dataset writing options

Reply via email to