[ 
https://issues.apache.org/jira/browse/ARROW-15409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-15409:
-----------------------------------
    Labels: good-first-issue pull-request-available  (was: good-first-issue)

> [C++] The C++ API for writing datasets could be improved
> --------------------------------------------------------
>
>                 Key: ARROW-15409
>                 URL: https://issues.apache.org/jira/browse/ARROW-15409
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Weston Pace
>            Assignee: Alvin Chunga Mamani
>            Priority: Major
>              Labels: good-first-issue, pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> I was working on write dataset testing in the C++ API today and ran into a 
> number of things that were not very intuitive.  All of these are abstracted 
> away / hidden by the python / R interface so this really only applies to 
> anyone using the C++ API directly.
>  * If no partitioning is specified the write will segfault.  Instead it 
> should us a default (no-op) partitioning.
>  * The min_rows_per_group option should probably default to something higher 
> than 0
>  * It's not clear how to specify the format (you do it by creating a format, 
> then setting the file write options, which sets the format privately)
>  * There is no default for basename_template
>  * There is no default for filesystem (should be local filesystem)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to