[GitHub] [arrow] vibhatha commented on a change in pull request #12112: ARROW-15183: [Python][Docs] Add Missing Dataset Write Options

GitBox Mon, 14 Mar 2022 22:34:15 -0700


vibhatha commented on a change in pull request #12112:
URL: https://github.com/apache/arrow/pull/12112#discussion_r826595344




##########
File path: docs/source/python/dataset.rst
##########
@@ -613,6 +613,60 @@ guidelines apply. Row groups can provide parallelism when 
reading and allow data
 based on statistics, but very small groups can cause metadata to be a 
significant portion
 of file size. Arrow's file writer provides sensible defaults for group sizing 
in most cases.
 
+Configuring files open during a write
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+When writing data to the disk, there are a few parameters that can be 
+important to optimize the writes, i.e number of rows per file and
+number of files open during write. 
+
+Set the maximum number of files opened with the ``max_open_files`` parameter of
+:meth:`write_dataset`.
+
+If  ``max_open_files`` is set greater than 0 then this will limit the maximum 
+number of files that can be left open. If an attempt is made to open too many 
+files then the least recently used file will be closed.  If this setting is 
set 

Review comment:
       This is another overlapping suggestion. I think the previous suggestion 
contained the expected idea. What do you think @wjones127? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] vibhatha commented on a change in pull request #12112: ARROW-15183: [Python][Docs] Add Missing Dataset Write Options

Reply via email to