[ 
https://issues.apache.org/jira/browse/ARROW-15602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488888#comment-17488888
 ] 

Nicola Crane commented on ARROW-15602:
--------------------------------------

Apologies, my previous example had an unnecessary extra space which was causing 
{{x}} to be read in as a character column.  Below is a new example without that 
error.  Here, by explicitly supplying a schema instead of the readr compact 
specification, the data can be read in:

{code:r}
library(arrow)
#> 
#> Attaching package: 'arrow'
#> The following object is masked from 'package:utils':
#> 
#>     timestamp
tf <- tempfile()
writeLines("x \n2004-04-01T12:00+09:00", tf)
readr::read_csv(tf, col_names = "x", skip = 1, col_types = "T")
#> # A tibble: 1 × 1
#>   x                  
#>   <dttm>             
#> 1 2004-04-01 03:00:00
arrow::read_csv_arrow(tf, col_names = "x", skip = 1, col_types = "T")
#> Error in `handle_csv_read_error()`:
#> ! Invalid: In CSV column #0: CSV conversion error to timestamp[s]: expected 
no zone offset in '2004-04-01T12:00+09:00'
#> /home/nic2/arrow/cpp/src/arrow/csv/converter.cc:550  decoder_.Decode(data, 
size, quoted, &value)
#> /home/nic2/arrow/cpp/src/arrow/csv/parser.h:123  status
#> /home/nic2/arrow/cpp/src/arrow/csv/converter.cc:554  
parser.VisitColumn(col_index, visit)
arrow::read_csv_arrow(tf, skip = 1, schema = schema(x = 
timestamp(timezone="+09:00")))
#> # A tibble: 1 × 1
#>   x                  
#>   <dttm>             
#> 1 2004-04-01 03:00:00
{code}
 
The error is due to the mapping of the readr specification "T" to an Arrow 
{{timestamp}} object with no timezone offset, whereas the data includes one.  
We should look to update this in our docs.

> [R] can't read timestamp with timezone from CSV (or other delimited) file
> -------------------------------------------------------------------------
>
>                 Key: ARROW-15602
>                 URL: https://issues.apache.org/jira/browse/ARROW-15602
>             Project: Apache Arrow
>          Issue Type: Improvement
>         Environment: R version 4.1.2 (2021-11-01)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 20.04.3 LTS
>            Reporter: SHIMA Tatsuya
>            Priority: Major
>
> The following values in a csv file can be read as timestamp by 
> `pyarrow.csv.read_csv` and `readr::read_csv`, but not by 
> `arrow::read_csv_arrow`.
> {code}
> "x"
> "2004-04-01T12:00+09:00"
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to