Weston Pace created ARROW-15409:
-----------------------------------

             Summary: [C++] The C++ API for writing datasets could be improved
                 Key: ARROW-15409
                 URL: https://issues.apache.org/jira/browse/ARROW-15409
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: Weston Pace


I was working on write dataset testing in the C++ API today and ran into a 
number of things that were not very intuitive.  All of these are abstracted 
away / hidden by the python / R interface so this really only applies to anyone 
using the C++ API directly.

 * If no partitioning is specified the write will segfault.  Instead it should 
us a default (no-op) partitioning.
 * The min_rows_per_group option should probably default to something higher 
than 0
 * It's not clear how to specify the format (you do it by creating a format, 
then setting the file write options, which sets the format privately)
 * There is no default for basename_template
 * There is no default for filesystem (should be local filesystem)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to