[
https://issues.apache.org/jira/browse/ARROW-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nicola Crane updated ARROW-15124:
---------------------------------
Summary: [R] default TZ parsing woes in CSV reader (was: default TZ
parsing woes in CSV reader)
> [R] default TZ parsing woes in CSV reader
> -----------------------------------------
>
> Key: ARROW-15124
> URL: https://issues.apache.org/jira/browse/ARROW-15124
> Project: Apache Arrow
> Issue Type: Bug
> Components: R
> Affects Versions: 6.0.1
> Reporter: Carl Boettiger
> Priority: Major
>
> I am attempting to use open_dataset() on a large collection of CSV files in
> which a timestamp column sometimes has a date format and sometimes a timezone
> format.
> readr is fine reading these both in with a col_type set to "timestamp" (i.e.
> see below), but arrow_read_csv insists the one must use tz="UTC" while the
> other must not use tz="UTC" in order for the schema to be valid. Easiest to
> see this in a simple example:
> {code:java}
> x <- tempfile()
> df <- data.frame(time = '2021-02-01T00:00:00Z')
> readr::write_csv(df, x)
> schema = arrow::schema(time = timestamp("s", ""))
> # ERROR cannot parse w/o tz="UTC" in the schema:
> arrow::read_csv_arrow(x,schema = schema, skip=1)
> df2 <- readr::read_csv(x, col_types="T") # works fine{code}
> {code:java}
> df <- data.frame(time = '2021-02-01')
> readr::write_csv(df, x)
> ## ERROR cannot parse w/ tz="UTC" :
> schema = arrow::schema(time = timestamp("s", "UTC"))
> arrow::read_csv_arrow(x,schema = schema, skip=1)
> ## Once again, readr has no issues:
> df2 <- readr::read_csv(x, col_types="T")
> {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)