[
https://issues.apache.org/jira/browse/ARROW-16480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nicola Crane updated ARROW-16480:
---------------------------------
Description:
>From a discussion on a PR which documents the encoding argument
>([https://github.com/apache/arrow/pull/13038)]
Currently if we want to specify Arrow-specific read options such as encoding,
we'd have to do something like this:
{code:java}
df <- read_csv_arrow(tf, read_options = CsvReadOptions$create(encoding =
"utf8")) {code}
However, this uses a lower-level API that we don't want to include in the
examples for end-users to see.
We should update the code inside {{read_csv_arrow()}} so that the user can
specify {{read_options}} as a list which we then pass through to CsvReadOptions
internally, so we could instead call the much more user-friendly code below:
{code:java}
df <- read_csv_arrow(tf, read_options = list(encoding = "utf8")) {code}
We should then add an example of this to the function doc examples.
We also should do the same for parse_options and convert_options.
Similarly, we can do:
{code:r}
open_dataset("data.csv", format = "csv", convert_options =
CsvConvertOptions$create(null_values = "Not Range", strings_can_be_null =
TRUE))%>% collect()
{code}
but it'd be great to be able to do:
{code:r}
open_dataset("data.csv", format = "csv", convert_options = list(null_values =
"Not Range", strings_can_be_null = TRUE))%>% collect()
{code}
was:
>From a discussion on a PR which documents the encoding argument
>([https://github.com/apache/arrow/pull/13038)]
Currently if we want to specify Arrow-specific read options such as encoding,
we'd have to do something like this:
{code:java}
df <- read_csv_arrow(tf, read_options = CsvReadOptions$create(encoding =
"utf8")) {code}
However, this uses a lower-level API that we don't want to include in the
examples for end-users to see.
We should update the code inside {{read_csv_arrow()}} so that the user can
specify {{read_options}} as a list which we then pass through to CsvReadOptions
internally, so we could instead call the much more user-friendly code below:
{code:java}
df <- read_csv_arrow(tf, read_options = list(encoding = "utf8")) {code}
We should then add an example of this to the function doc examples.
We also should do the same for parse_options and convert_options.
> [R] Update read_csv_arrow and open_dataset parse_options, read_options, and
> convert_options to take lists
> ---------------------------------------------------------------------------------------------------------
>
> Key: ARROW-16480
> URL: https://issues.apache.org/jira/browse/ARROW-16480
> Project: Apache Arrow
> Issue Type: Sub-task
> Components: R
> Reporter: Nicola Crane
> Priority: Major
> Labels: good-first-issue, good-second-issue
>
> From a discussion on a PR which documents the encoding argument
> ([https://github.com/apache/arrow/pull/13038)]
> Currently if we want to specify Arrow-specific read options such as encoding,
> we'd have to do something like this:
> {code:java}
> df <- read_csv_arrow(tf, read_options = CsvReadOptions$create(encoding =
> "utf8")) {code}
> However, this uses a lower-level API that we don't want to include in the
> examples for end-users to see.
>
> We should update the code inside {{read_csv_arrow()}} so that the user can
> specify {{read_options}} as a list which we then pass through to
> CsvReadOptions internally, so we could instead call the much more
> user-friendly code below:
> {code:java}
> df <- read_csv_arrow(tf, read_options = list(encoding = "utf8")) {code}
> We should then add an example of this to the function doc examples.
>
> We also should do the same for parse_options and convert_options.
> Similarly, we can do:
> {code:r}
> open_dataset("data.csv", format = "csv", convert_options =
> CsvConvertOptions$create(null_values = "Not Range", strings_can_be_null =
> TRUE))%>% collect()
> {code}
> but it'd be great to be able to do:
> {code:r}
> open_dataset("data.csv", format = "csv", convert_options = list(null_values =
> "Not Range", strings_can_be_null = TRUE))%>% collect()
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)