[
https://issues.apache.org/jira/browse/ARROW-15409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Weston Pace updated ARROW-15409:
--------------------------------
Labels: good-first-issue (was: )
> [C++] The C++ API for writing datasets could be improved
> --------------------------------------------------------
>
> Key: ARROW-15409
> URL: https://issues.apache.org/jira/browse/ARROW-15409
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Weston Pace
> Priority: Major
> Labels: good-first-issue
>
> I was working on write dataset testing in the C++ API today and ran into a
> number of things that were not very intuitive. All of these are abstracted
> away / hidden by the python / R interface so this really only applies to
> anyone using the C++ API directly.
> * If no partitioning is specified the write will segfault. Instead it
> should us a default (no-op) partitioning.
> * The min_rows_per_group option should probably default to something higher
> than 0
> * It's not clear how to specify the format (you do it by creating a format,
> then setting the file write options, which sets the format privately)
> * There is no default for basename_template
> * There is no default for filesystem (should be local filesystem)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)