jorisvandenbossche commented on a change in pull request #10955:
URL: https://github.com/apache/arrow/pull/10955#discussion_r718483793
##########
File path: cpp/src/arrow/dataset/file_base.h
##########
@@ -364,6 +376,18 @@ struct ARROW_DS_EXPORT FileSystemDatasetWriteOptions {
/// {i} will be replaced by an auto incremented integer.
std::string basename_template;
+ /// If greater than 0 then this will limit the maximum number of files that
can be left
+ /// open. If an attempt is made to open too many files then the least
recently used file
+ /// will be closed. If this setting is set too low you may end up
fragmenting your data
+ /// into many small files.
+ uint32_t max_open_files = 1024;
+
+ /// If greater than 0 then this will limit how many rows are placed in any
single file.
Review comment:
Thanks for the update. That's clear now.
(I am wondering a bit whether this is actually a good default, or if we
shouldn't put some default max file size. Although that probably depends a lot
on your use case / whether the file format can easily handle large files, in
which case it might be better to explicitly leave it up to the user)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]