[ 
https://issues.apache.org/jira/browse/ARROW-16480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-16480:
-----------------------------------
    Labels: good-first-issue good-second-issue pull-request-available  (was: 
good-first-issue good-second-issue)

> [R] Update read_csv_arrow and open_dataset parse_options, read_options, and 
> convert_options to take lists
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-16480
>                 URL: https://issues.apache.org/jira/browse/ARROW-16480
>             Project: Apache Arrow
>          Issue Type: Sub-task
>          Components: R
>            Reporter: Nicola Crane
>            Assignee: Nicola Crane
>            Priority: Major
>              Labels: good-first-issue, good-second-issue, 
> pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> From a discussion on a PR which documents the encoding argument 
> ([https://github.com/apache/arrow/pull/13038)]
> Currently if we want to specify Arrow-specific read options such as encoding, 
> we'd have to do something like this:
> {code:java}
> df <- read_csv_arrow(tf, read_options = CsvReadOptions$create(encoding = 
> "utf8")) {code}
> However, this uses a lower-level API that we don't want to include in the 
> examples for end-users to see.
>  
> We should update the code inside {{read_csv_arrow()}} so that the user can 
> specify {{read_options}} as a list which we then pass through to 
> CsvReadOptions internally, so we could instead call the much more 
> user-friendly code below:
> {code:java}
> df <- read_csv_arrow(tf, read_options = list(encoding = "utf8")) {code}
> We should then add an example of this to the function doc examples.
>  
> We also should do the same for parse_options and convert_options.
> Similarly, we can do:
> {code:r}
> open_dataset("data.csv", format = "csv", convert_options = 
> CsvConvertOptions$create(null_values = "Not Range", strings_can_be_null = 
> TRUE))%>% collect()
> {code}
> but it'd be great to be able to do:
> {code:r}
> open_dataset("data.csv", format = "csv", convert_options = list(null_values = 
> "Not Range", strings_can_be_null = TRUE))%>% collect()
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to