jonkeane commented on a change in pull request #12240:
URL: https://github.com/apache/arrow/pull/12240#discussion_r791019809



##########
File path: r/tests/testthat/test-Array.R
##########
@@ -985,3 +985,14 @@ test_that("Array to C-interface", {
   delete_arrow_schema(schema_ptr)
   delete_arrow_array(array_ptr)
 })
+
+test_that("Array coverts timestamps with missing timezone /assumed local tz 
correctly", {
+  withr::with_envvar(c(TZ = "America/Chicago"), {

Review comment:
       We want this to actually be unset though, right? Similar to 
https://github.com/apache/arrow/blob/daa5c18e9697a6455a7a75fec19594543c17b21e/r/tests/testthat/test-Array.R#L264-L267
 to simulate the circumstance where TZ is unset (though we might want to use 
`TZ = NA` instead of `TZ = ""` there since `NA` _unsets_ the variable instead 
of simply setting it to `""`)
   
   ``` r
   Sys.getenv("TZ")
   #> [1] ""
   
   timestamp_r <- as.POSIXct("2018-10-07 19:04:05")
   timestamp_r
   #> [1] "2018-10-07 19:04:05 CDT"
   attributes(timestamp_r)
   #> $class
   #> [1] "POSIXct" "POSIXt" 
   #> 
   #> $tzone
   #> [1] ""
   as.integer(timestamp_r)
   #> [1] 1538957045
   
   Sys.setenv("TZ" = "Australia/Brisbane")
   
   timestamp_r <- as.POSIXct("2018-10-07 19:04:05")
   timestamp_r
   #> [1] "2018-10-07 19:04:05 AEST"
   attributes(timestamp_r)
   #> $class
   #> [1] "POSIXct" "POSIXt" 
   #> 
   #> $tzone
   #> [1] ""
   as.integer(timestamp_r)
   #> [1] 1538903045
   
   Sys.unsetenv("TZ")
   
   timestamp_r <- as.POSIXct("2018-10-07 19:04:05")
   timestamp_r
   #> [1] "2018-10-07 19:04:05 CDT"
   attributes(timestamp_r)
   #> $class
   #> [1] "POSIXct" "POSIXt" 
   #> 
   #> $tzone
   #> [1] ""
   as.integer(timestamp_r)
   #> [1] 1538957045
   ```

##########
File path: r/tests/testthat/test-Array.R
##########
@@ -985,3 +985,14 @@ test_that("Array to C-interface", {
   delete_arrow_schema(schema_ptr)
   delete_arrow_array(array_ptr)
 })
+
+test_that("Array coverts timestamps with missing timezone /assumed local tz 
correctly", {
+  withr::with_envvar(c(TZ = "America/Chicago"), {
+    a <- as.POSIXct("1970-01-01 00:00:00")
+    attr(a, "tzone") <- Sys.getenv("TZ")

Review comment:
       This is a good first step, but would it be better to have two timestamps 
here? One that was created with `TZ` unset, and then one where we specifically 
set the timezone with `attr(b, "tzone"), Sys.timezone())` And confirm that 
those two arrays are equal?

##########
File path: r/tests/testthat/test-Array.R
##########
@@ -985,3 +985,14 @@ test_that("Array to C-interface", {
   delete_arrow_schema(schema_ptr)
   delete_arrow_array(array_ptr)
 })
+
+test_that("Array coverts timestamps with missing timezone /assumed local tz 
correctly", {
+  withr::with_envvar(c(TZ = "America/Chicago"), {
+    a <- as.POSIXct("1970-01-01 00:00:00")
+    attr(a, "tzone") <- Sys.getenv("TZ")

Review comment:
       > But wouldn't those arrays be equal in their absolute value (without 
the "tzone" medatadata)
   
   This is what we expect + want, no? R creates a POSIXct by taking the 
datetime string you have and converting it to the number of seconds from the 
epoch based on the time string being in the local timezone of the session 
(unless you proactively provide a different one). This is what I mean when I 
say that for R the timezoneless timestamps are *not* naive, they are really 
timestamps at a specific timezone, R just happens to spell that timezone 
confusingly as `""` sometimes.

##########
File path: r/tests/testthat/test-Array.R
##########
@@ -985,3 +985,14 @@ test_that("Array to C-interface", {
   delete_arrow_schema(schema_ptr)
   delete_arrow_array(array_ptr)
 })
+
+test_that("Array coverts timestamps with missing timezone /assumed local tz 
correctly", {
+  withr::with_envvar(c(TZ = "America/Chicago"), {
+    a <- as.POSIXct("1970-01-01 00:00:00")
+    attr(a, "tzone") <- Sys.getenv("TZ")

Review comment:
       The phrase "the display" here is confusing / wrong in some 
circumstances. When printing arrays, currently AFAICT arrow prints the 
timestamp in UTC for datetimes regardless if there is a timezone attached or 
not:
   
   ``` r
   library(arrow, warn.conflicts = FALSE)
   
   # specifically setting the timezone, and the Arrow Array repl shows UTC
   ts <- as.POSIXct("2020-01-01 02:00:00", tz = "America/Chicago") + 1:10*3600
   ts
   #>  [1] "2020-01-01 03:00:00 CST" "2020-01-01 04:00:00 CST"
   #>  [3] "2020-01-01 05:00:00 CST" "2020-01-01 06:00:00 CST"
   #>  [5] "2020-01-01 07:00:00 CST" "2020-01-01 08:00:00 CST"
   #>  [7] "2020-01-01 09:00:00 CST" "2020-01-01 10:00:00 CST"
   #>  [9] "2020-01-01 11:00:00 CST" "2020-01-01 12:00:00 CST"
   attr(ts, "tzone")
   #> [1] "America/Chicago"
   
   arr <- Array$create(ts)
   arr
   #> Array
   #> <timestamp[us, tz=America/Chicago]>
   #> [
   #>   2020-01-01 09:00:00.000000,
   #>   2020-01-01 10:00:00.000000,
   #>   2020-01-01 11:00:00.000000,
   #>   2020-01-01 12:00:00.000000,
   #>   2020-01-01 13:00:00.000000,
   #>   2020-01-01 14:00:00.000000,
   #>   2020-01-01 15:00:00.000000,
   #>   2020-01-01 16:00:00.000000,
   #>   2020-01-01 17:00:00.000000,
   #>   2020-01-01 18:00:00.000000
   #> ]
   arr$type$timezone()
   #> [1] "America/Chicago"
   
   as.vector(arr)
   #>  [1] "2020-01-01 03:00:00 CST" "2020-01-01 04:00:00 CST"
   #>  [3] "2020-01-01 05:00:00 CST" "2020-01-01 06:00:00 CST"
   #>  [5] "2020-01-01 07:00:00 CST" "2020-01-01 08:00:00 CST"
   #>  [7] "2020-01-01 09:00:00 CST" "2020-01-01 10:00:00 CST"
   #>  [9] "2020-01-01 11:00:00 CST" "2020-01-01 12:00:00 CST"
   attr(as.vector(arr), "tzone")
   #> [1] "America/Chicago"
   
   
   # without setting the timezone, and the Arrow Array repl still shows UTC
   ts <- as.POSIXct("2020-01-01 02:00:00") + 1:10*3600
   ts
   #>  [1] "2020-01-01 03:00:00 CST" "2020-01-01 04:00:00 CST"
   #>  [3] "2020-01-01 05:00:00 CST" "2020-01-01 06:00:00 CST"
   #>  [5] "2020-01-01 07:00:00 CST" "2020-01-01 08:00:00 CST"
   #>  [7] "2020-01-01 09:00:00 CST" "2020-01-01 10:00:00 CST"
   #>  [9] "2020-01-01 11:00:00 CST" "2020-01-01 12:00:00 CST"
   attr(ts[[1]], "tzone")
   #> NULL
   
   arr <- Array$create(ts)
   arr
   #> Array
   #> <timestamp[us]>
   #> [
   #>   2020-01-01 09:00:00.000000,
   #>   2020-01-01 10:00:00.000000,
   #>   2020-01-01 11:00:00.000000,
   #>   2020-01-01 12:00:00.000000,
   #>   2020-01-01 13:00:00.000000,
   #>   2020-01-01 14:00:00.000000,
   #>   2020-01-01 15:00:00.000000,
   #>   2020-01-01 16:00:00.000000,
   #>   2020-01-01 17:00:00.000000,
   #>   2020-01-01 18:00:00.000000
   #> ]
   arr$type$timezone()
   #> [1] ""
   
   as.vector(arr)
   #>  [1] "2020-01-01 03:00:00 CST" "2020-01-01 04:00:00 CST"
   #>  [3] "2020-01-01 05:00:00 CST" "2020-01-01 06:00:00 CST"
   #>  [5] "2020-01-01 07:00:00 CST" "2020-01-01 08:00:00 CST"
   #>  [7] "2020-01-01 09:00:00 CST" "2020-01-01 10:00:00 CST"
   #>  [9] "2020-01-01 11:00:00 CST" "2020-01-01 12:00:00 CST"
   attr(as.vector(arr), "tzone")
   #> NULL
   ```
   
   But as I showed up there, when pulling the data back in with 
`as.vector(arr)`, the timezone is pulled in with it so that when R displays the 
timestamp it is faithful to the original timestamp.

##########
File path: r/tests/testthat/test-Array.R
##########
@@ -985,3 +985,14 @@ test_that("Array to C-interface", {
   delete_arrow_schema(schema_ptr)
   delete_arrow_array(array_ptr)
 })
+
+test_that("Array coverts timestamps with missing timezone /assumed local tz 
correctly", {
+  withr::with_envvar(c(TZ = "America/Chicago"), {
+    a <- as.POSIXct("1970-01-01 00:00:00")
+    attr(a, "tzone") <- Sys.getenv("TZ")

Review comment:
       Thanks for digging that up, I _assumed_ it existed already but hadn't 
gone searching
   

##########
File path: r/tests/testthat/test-Array.R
##########
@@ -985,3 +985,14 @@ test_that("Array to C-interface", {
   delete_arrow_schema(schema_ptr)
   delete_arrow_array(array_ptr)
 })
+
+test_that("Array coverts timestamps with missing timezone /assumed local tz 
correctly", {
+  withr::with_envvar(c(TZ = "America/Chicago"), {
+    a <- as.POSIXct("1970-01-01 00:00:00")
+    attr(a, "tzone") <- Sys.getenv("TZ")

Review comment:
       Yeah, let's save the display fixing to ARROW-14567 — I also added a 
comment there and the R component since it should all wire up either the same 
or very easily after that. Definitely out of scope for this ticket.
   
   > I think we can attach the local / system timezone when it isn't passed 
explicitly (and this would theoretically solve this Jira issue).
   
   This sounds like the right approach
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to