Weston Pace created ARROW-13333:
-----------------------------------

             Summary: [C++] [Dataset] Support max file size option in write 
dataset
                 Key: ARROW-13333
                 URL: https://issues.apache.org/jira/browse/ARROW-13333
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: Weston Pace


The existence FileSystemDatasetWriteOptions::basename_template would seem to 
imply that the dataset writer may write multiple files for a given partition.  
However, the current implementation will always create one file per directory.

 

I'm not sure what the desired behavior is here but the two obvious choices are:

 * Get rid of FileSystemDatasetWriteOptions::basename_template (or at least the 
\{i} parameter)

 * Add an option to limit how many rows/bytes are put in a single file

 

ARROW-12358 is probably worth mentioning as whatever strategy is come up with 
here should probably be compatible with supporting append mode in the future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to