[jira] [Created] (ARROW-15124) default TZ parsing woes in CSV reader

Carl Boettiger (Jira) Wed, 15 Dec 2021 16:40:12 -0800

Carl Boettiger created ARROW-15124:
--------------------------------------

             Summary: default TZ parsing woes in CSV reader
                 Key: ARROW-15124
                 URL: https://issues.apache.org/jira/browse/ARROW-15124
             Project: Apache Arrow
          Issue Type: Bug
          Components: R
    Affects Versions: 6.0.1
            Reporter: Carl Boettiger



I am attempting to use open_dataset() on a large collection of CSV files in 
which a timestamp column sometimes has a date format and sometimes a timezone 
format.

readr is fine reading these both in with a col_type set to "timestamp" (i.e. 
see below), but arrow_read_csv insists the one must use tz="UTC" while the 
other must not use tz="UTC" in order for the schema to be valid.  Easiest to 
see this in a simple example:


{code:java}
x <- tempfile()
df <- data.frame(time = '2021-02-01T00:00:00Z')
readr::write_csv(df, x)
schema = arrow::schema(time = timestamp("s", ""))

# ERROR cannot parse w/o tz="UTC" in the schema:
arrow::read_csv_arrow(x,schema = schema, skip=1) 

df2 <- readr::read_csv(x, col_types="T")  # works fine{code}
{code:java}
df <- data.frame(time = '2021-02-01')
readr::write_csv(df, x)
## ERROR cannot parse w/ tz="UTC" :
schema = arrow::schema(time = timestamp("s", "UTC")) 
arrow::read_csv_arrow(x,schema = schema, skip=1)

## Once again, readr has no issues:
df2 <- readr::read_csv(x, col_types="T")
 {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (ARROW-15124) default TZ parsing woes in CSV reader

Reply via email to