Weston Pace created ARROW-14164:
-----------------------------------
Summary: [C++][Dataset] Enhance dataset writer to allow
file-per-batch
Key: ARROW-14164
URL: https://issues.apache.org/jira/browse/ARROW-14164
Project: Apache Arrow
Issue Type: Improvement
Components: C++
Reporter: Weston Pace
The dataset writer currently groups incoming batches into large files which are
controlled by max_rows_per_file. In the PR for this work [~jorisvandenbossche]
recommended an option where each incoming batch creates a new file.
This would give the user fine grained control over how many files are created
(provided they are doing a very basic scan/filter/project and not using any
more sophisticated nodes which may resize batches.)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)