[ 
https://issues.apache.org/jira/browse/ARROW-15409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weston Pace updated ARROW-15409:
--------------------------------
    Labels: good-first-issue  (was: )

> [C++] The C++ API for writing datasets could be improved
> --------------------------------------------------------
>
>                 Key: ARROW-15409
>                 URL: https://issues.apache.org/jira/browse/ARROW-15409
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Weston Pace
>            Priority: Major
>              Labels: good-first-issue
>
> I was working on write dataset testing in the C++ API today and ran into a 
> number of things that were not very intuitive.  All of these are abstracted 
> away / hidden by the python / R interface so this really only applies to 
> anyone using the C++ API directly.
>  * If no partitioning is specified the write will segfault.  Instead it 
> should us a default (no-op) partitioning.
>  * The min_rows_per_group option should probably default to something higher 
> than 0
>  * It's not clear how to specify the format (you do it by creating a format, 
> then setting the file write options, which sets the format privately)
>  * There is no default for basename_template
>  * There is no default for filesystem (should be local filesystem)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to