jonkeane commented on a change in pull request #12319:
URL: https://github.com/apache/arrow/pull/12319#discussion_r798743253
##########
File path: r/R/dplyr-funcs-type.R
##########
@@ -248,3 +249,33 @@ register_bindings_type_elementwise <- function() {
is_inf & !call_binding("is.na", is_inf)
})
}
+
+register_bindings_type_format <- function() {
+ register_binding("format", function(x, ...) {
+ if (inherits(x, "Expression") &&
+ x$type_id() %in% Type[c("TIMESTAMP", "DATE32", "DATE64")]) {
+ binding_format_datetime(x, ...)
+ } else {
+ abort(paste0("`format()` not yet supported for `", class(x$type())[[1]],
"`"))
+ }
+ })
+}
+
+binding_format_datetime <- function(x, format = "", tz = "", usetz = FALSE) {
+
+ if (usetz) {
+ format <- paste(format, "%Z")
+ }
+
+ if (call_binding("is.POSIXct", x)) {
+ if (tz == "" && x$type()$timezone() != "") {
+ tz <- x$type()$timezone()
+ } else if (tz == "") {
+ tz <- Sys.timezone()
+ }
+ ts <- Expression$create("cast", x, options = list(to_type =
timestamp(x$type()$unit(), tz)))
Review comment:
Especially with #12240 I don't think we'll need this cast part here.
Even without that PR, I'm not sure there's a huge amount of benefit of assuming
(and then converting) the timezones like R does it (and that's actually counter
to what Arrow is supposed to do with timestamps that don't have timezones)
##########
File path: r/tests/testthat/test-dplyr-funcs-type.R
##########
@@ -768,3 +768,84 @@ test_that("nested structs can be created from scalars and
existing data frames",
tibble(a = 1:2)
)
})
+
+test_that("format date/time", {
+ skip_on_os("windows") # https://issues.apache.org/jira/browse/ARROW-13168
+
+ times <- tibble(
+ datetime = c(lubridate::ymd_hms("2018-10-07 19:04:05", tz = "Etc/GMT+6"),
NA),
+ date = c(as.Date("2021-01-01"), NA)
+ )
+ formats <- "%a %A %w %d %b %B %m %y %Y %H %I %p %M %z %Z %j %U %W %x %X %%
%G %V %u"
+ formats_date <- "%a %A %w %d %b %B %m %y %Y %H %I %p %M %j %U %W %x %X %% %G
%V %u"
+
+ compare_dplyr_binding(
+ .input %>%
+ mutate(x = format(datetime, format = formats)) %>%
+ collect(),
+ times
+ )
+
+ compare_dplyr_binding(
+ .input %>%
+ mutate(x = format(date, format = formats_date)) %>%
+ collect(),
+ times
+ )
+
+ compare_dplyr_binding(
+ .input %>%
+ mutate(x = format(datetime, format = formats, tz = "Pacific/Marquesas"))
%>%
+ collect(),
+ times
+ )
+
+ compare_dplyr_binding(
+ .input %>%
+ mutate(x = format(datetime, format = formats, tz = "EST", usetz = TRUE))
%>%
Review comment:
Like here, if we wanted to assert that formating with a different
timezone gives us the right thing, that would be a good place to do something
other than `"Pacific/Marquesas"`
##########
File path: r/tests/testthat/test-dplyr-funcs-type.R
##########
@@ -768,3 +768,84 @@ test_that("nested structs can be created from scalars and
existing data frames",
tibble(a = 1:2)
)
})
+
+test_that("format date/time", {
+ skip_on_os("windows") # https://issues.apache.org/jira/browse/ARROW-13168
+
+ times <- tibble(
+ datetime = c(lubridate::ymd_hms("2018-10-07 19:04:05", tz = "Etc/GMT+6"),
NA),
Review comment:
Is there something particular about `"Etc/GMT+6"`? We use
`"Pacific/Marquesas"` elsewhere to be 1. a timezone most people don't already
have configured and 2. a timezone that is different from UTC and 3. a timezone
that isn't even on the hour all of which to attempt to catch places where we
assume any of those three. _Generally_ if we're picking a random timezone we
should favor `"Pacific/Marquesas"` unless there are specific reasons to do
something else.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]