Nic Crane created ARROW-13278:
---------------------------------
Summary: [R] open_dataset autodetects types wrong in fairly
unambiguous data
Key: ARROW-13278
URL: https://issues.apache.org/jira/browse/ARROW-13278
Project: Apache Arrow
Issue Type: Bug
Components: R
Reporter: Nic Crane
Assignee: Nic Crane
{code:java}
# Write some partitioned data to disk to read back in
write_dataset(airquality, "airquality_partitioned", partitioning = c("Month",
"Day"))
# Read data from folder
air_data <- open_dataset("airquality_partitioned", partitioning = c("Month",
"Day"))
> air_data
FileSystemDataset with 153 Parquet files
Ozone: int32
Solar.R: int32
Wind: double
Temp: int32
Month: string
Day: string{code}
Month and Day are integers and there are no NA values in these columns of the
data so, given the docs for open_dataset say that partitioning can be supplied
as "a character vector that defines the field names corresponding to those path
segments (that is, you're providing the names that would correspond to a Schema
but the types will be autodetected)", this looks like it might be a bug
somewhere.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)