Weston Pace created ARROW-15409:
-----------------------------------
Summary: [C++] The C++ API for writing datasets could be improved
Key: ARROW-15409
URL: https://issues.apache.org/jira/browse/ARROW-15409
Project: Apache Arrow
Issue Type: Improvement
Components: C++
Reporter: Weston Pace
I was working on write dataset testing in the C++ API today and ran into a
number of things that were not very intuitive. All of these are abstracted
away / hidden by the python / R interface so this really only applies to anyone
using the C++ API directly.
* If no partitioning is specified the write will segfault. Instead it should
us a default (no-op) partitioning.
* The min_rows_per_group option should probably default to something higher
than 0
* It's not clear how to specify the format (you do it by creating a format,
then setting the file write options, which sets the format privately)
* There is no default for basename_template
* There is no default for filesystem (should be local filesystem)
--
This message was sent by Atlassian Jira
(v8.20.1#820001)