[GitHub] [arrow] jonkeane commented on a change in pull request #12319: ARROW-14199 [R] bindings for format (where possible)

GitBox Fri, 04 Feb 2022 08:22:23 -0800


jonkeane commented on a change in pull request #12319:
URL: https://github.com/apache/arrow/pull/12319#discussion_r798743253




##########
File path: r/R/dplyr-funcs-type.R
##########
@@ -248,3 +249,33 @@ register_bindings_type_elementwise <- function() {
     is_inf & !call_binding("is.na", is_inf)
   })
 }
+
+register_bindings_type_format <- function() {
+  register_binding("format", function(x, ...) {
+    if (inherits(x, "Expression") &&
+        x$type_id() %in% Type[c("TIMESTAMP", "DATE32", "DATE64")]) {
+      binding_format_datetime(x, ...)
+    } else {
+      abort(paste0("`format()` not yet supported for `", class(x$type())[[1]], 
"`"))
+    }
+  })
+}
+
+binding_format_datetime <- function(x, format = "", tz = "", usetz = FALSE) {
+
+  if (usetz) {
+    format <- paste(format, "%Z")
+  }
+
+  if (call_binding("is.POSIXct", x)) {
+    if (tz == "" && x$type()$timezone() != "") {
+      tz <- x$type()$timezone()
+    } else if (tz == "") {
+      tz <- Sys.timezone()
+    }
+    ts <- Expression$create("cast", x, options = list(to_type = 
timestamp(x$type()$unit(), tz)))

Review comment:
       Especially with #12240 I don't think we'll need this cast part here. 
Even without that PR, I'm not sure there's a huge amount of benefit of assuming 
(and then converting) the timezones like R does it (and that's actually counter 
to what Arrow is supposed to do with timestamps that don't have timezones)

##########
File path: r/tests/testthat/test-dplyr-funcs-type.R
##########
@@ -768,3 +768,84 @@ test_that("nested structs can be created from scalars and 
existing data frames",
     tibble(a = 1:2)
   )
 })
+
+test_that("format date/time", {
+  skip_on_os("windows") # https://issues.apache.org/jira/browse/ARROW-13168
+
+  times <- tibble(
+    datetime = c(lubridate::ymd_hms("2018-10-07 19:04:05", tz = "Etc/GMT+6"), 
NA),
+    date = c(as.Date("2021-01-01"), NA)
+  )
+  formats <- "%a %A %w %d %b %B %m %y %Y %H %I %p %M %z %Z %j %U %W %x %X %% 
%G %V %u"
+  formats_date <- "%a %A %w %d %b %B %m %y %Y %H %I %p %M %j %U %W %x %X %% %G 
%V %u"
+
+  compare_dplyr_binding(
+    .input %>%
+      mutate(x = format(datetime, format = formats)) %>%
+      collect(),
+    times
+  )
+
+  compare_dplyr_binding(
+    .input %>%
+      mutate(x = format(date, format = formats_date)) %>%
+      collect(),
+    times
+  )
+
+  compare_dplyr_binding(
+    .input %>%
+      mutate(x = format(datetime, format = formats, tz = "Pacific/Marquesas")) 
%>%
+      collect(),
+    times
+  )
+
+  compare_dplyr_binding(
+    .input %>%
+      mutate(x = format(datetime, format = formats, tz = "EST", usetz = TRUE)) 
%>%

Review comment:
       Like here, if we wanted to assert that formating with a different 
timezone gives us the right thing, that would be a good place to do something 
other than `"Pacific/Marquesas"`

##########
File path: r/tests/testthat/test-dplyr-funcs-type.R
##########
@@ -768,3 +768,84 @@ test_that("nested structs can be created from scalars and 
existing data frames",
     tibble(a = 1:2)
   )
 })
+
+test_that("format date/time", {
+  skip_on_os("windows") # https://issues.apache.org/jira/browse/ARROW-13168
+
+  times <- tibble(
+    datetime = c(lubridate::ymd_hms("2018-10-07 19:04:05", tz = "Etc/GMT+6"), 
NA),

Review comment:
       Is there something particular about `"Etc/GMT+6"`? We use 
`"Pacific/Marquesas"` elsewhere to be 1. a timezone most people don't already 
have configured and 2. a timezone that is different from UTC and 3. a timezone 
that isn't even on the hour all of which to attempt to catch places where we 
assume any of those three. _Generally_ if we're picking a random timezone we 
should favor `"Pacific/Marquesas"` unless there are specific reasons to do 
something else.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] jonkeane commented on a change in pull request #12319: ARROW-14199 [R] bindings for format (where possible)

Reply via email to