[
https://issues.apache.org/jira/browse/ARROW-18266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nicola Crane updated ARROW-18266:
---------------------------------
Description:
It's not all that clear from our docs that if we want to read in a Parquet file
and change the schema, we need to call the {{cast()}} method on the Table, e.g.
{code:r}
# Write out data
data <- tibble::tibble(x = c(letters[1:5], NA), y = 1:6)
data_with_schema <- arrow_table(data, schema = schema(x = string(), y =
int64()))
write_parquet(data_with_schema, "data_with_schema.parquet")
# Read in data while specifying a schema
data_in <- read_parquet("data_with_schema.parquet", as_data_frame = FALSE)
data_in$cast(target_schema = schema(x = string(), y = int32()))
{code}
We should document this more clearly. Pehaps we could even update the code here
to automatically do some of this if we pass in a schema to the {{...}} argument
of {{read_parquet}} _and_ the returned data doesn't match the desired schema?
was:
It's not all that clear from our docs that if we want to read in a Parquet file
and change the schema, we need to call the {{cast()}} method on the Table, e.g.
{code:r}
# Write out data
data <- tibble::tibble(x = c(letters[1:5], NA), y = 1:6)
data_with_schema <- arrow_table(data, schema = schema(x = string(), y =
int64()))
write_parquet(data_with_schema, "data_with_schema.parquet")
# Read in data while specifying a schema
data_in <- read_parquet("data_with_schema.parquet", as_data_frame = FALSE)
data_in$cast(target_schema = schema(x = string(), y = int32()))
{code}
We should document this more clearly. Pehaps we could even update the code here
to automatically do some of this if we pass in a schema to the {...} argument
of {{read_parquet}} _and_ the returned data doesn't match the desired schema?
> [R] Make it more obvious how to read in a Parquet file with a different
> schema to the inferred one
> --------------------------------------------------------------------------------------------------
>
> Key: ARROW-18266
> URL: https://issues.apache.org/jira/browse/ARROW-18266
> Project: Apache Arrow
> Issue Type: Improvement
> Components: R
> Reporter: Nicola Crane
> Priority: Major
>
> It's not all that clear from our docs that if we want to read in a Parquet
> file and change the schema, we need to call the {{cast()}} method on the
> Table, e.g.
> {code:r}
> # Write out data
> data <- tibble::tibble(x = c(letters[1:5], NA), y = 1:6)
> data_with_schema <- arrow_table(data, schema = schema(x = string(), y =
> int64()))
> write_parquet(data_with_schema, "data_with_schema.parquet")
> # Read in data while specifying a schema
> data_in <- read_parquet("data_with_schema.parquet", as_data_frame = FALSE)
> data_in$cast(target_schema = schema(x = string(), y = int32()))
> {code}
> We should document this more clearly. Pehaps we could even update the code
> here to automatically do some of this if we pass in a schema to the {{...}}
> argument of {{read_parquet}} _and_ the returned data doesn't match the
> desired schema?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)