Charlie Gao created ARROW-13588:
-----------------------------------

             Summary: R empty character attributes not stored
                 Key: ARROW-13588
                 URL: https://issues.apache.org/jira/browse/ARROW-13588
             Project: Apache Arrow
          Issue Type: Bug
          Components: R
    Affects Versions: 5.0.0
         Environment: Ubuntu 20.04 R 4.1 release
            Reporter: Charlie Gao


I have come across an issue in the process of incorporating arrow in a package 
I develop.

Date-times in the POSIXct format have a 'tzone' attribute that by default is 
set to "", an empty character vector (not NULL) when created.

This however is not stored in the Arrow feather file. When the file is read 
back, the original and restored dataframes are not identical as per the below 
reprex.

I am thinking that this should not be the intention? My workaround at the 
moment is making a check when reading back to write the empty string if the 
tzone attribute does not exist.

Just to confirm, this is not an issue when the attribute is not empty - it gets 
stored correctly.

Thanks.

``` r
 dates <- as.POSIXct(c("2020-01-01", "2020-01-02", "2020-01-02"))
 attributes(dates)
 #> $class
 #> [1] "POSIXct" "POSIXt" 
 #> 
 #> $tzone
 #> [1] ""
 values <- c(1:3)
 original <- data.frame(dates, values)
 original
 #> dates values
 #> 1 2020-01-01 1
 #> 2 2020-01-02 2
 #> 3 2020-01-02 3

tempfile <- tempfile()
 arrow::write_feather(original, tempfile)

restored <- arrow::read_feather(tempfile)

identical(original, restored)
 #> [1] FALSE
 waldo::compare(original, restored)
 #> `attr(old$dates, 'tzone')` is a character vector ('')
 #> `attr(new$dates, 'tzone')` is absent

unlink(tempfile)
 ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to