[GitHub] [arrow] vibhatha commented on a change in pull request #12112: ARROW-15183: [Python][Docs] Add Missing Dataset Write Options

GitBox Mon, 17 Jan 2022 18:38:58 -0800


vibhatha commented on a change in pull request #12112:
URL: https://github.com/apache/arrow/pull/12112#discussion_r786377629




##########
File path: docs/source/python/dataset.rst
##########
@@ -699,3 +699,46 @@ Parquet files:
 
     # also clean-up custom base directory used in some examples
     shutil.rmtree(str(base), ignore_errors=True)
+
+
+Configuring files open during a write
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+When writing data to the disk, there are a few parameters that can be 
+important to optimize the writes, i.e number of rows per file and
+number of files open during write. 
+
+The number of files opened at during the write time can be set as follows;
+
+.. ipython:: python
+
+    ds.write_dataset(data=table, base_dir="data_dir", 
max_open_files=max_open_files)
+
+The maximum number of rows per file can be set as follows;
+
+.. ipython:: pythoin
+    ds.write_dataset(record_batch, "data_dir", format="parquet",

Review comment:
       I replaced this with the short description on the option name and it's 
functionality ( minor modification to C++ docstring and added here) 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] vibhatha commented on a change in pull request #12112: ARROW-15183: [Python][Docs] Add Missing Dataset Write Options

Reply via email to