[GitHub] [arrow] jonkeane commented on a change in pull request #12482: ARROW-15701 [R] month() should allow integer inputs

GitBox Tue, 08 Mar 2022 06:20:48 -0800


jonkeane commented on a change in pull request #12482:
URL: https://github.com/apache/arrow/pull/12482#discussion_r821704331




##########
File path: r/R/dplyr-funcs-datetime.R
##########
@@ -105,17 +105,39 @@ register_bindings_datetime <- function() {
     (call_binding("yday", x) - 1) %/% 7 + 1
   })
 
-  register_binding("month", function(x, label = FALSE, abbr = TRUE, locale = 
Sys.getlocale("LC_TIME")) {
+  register_binding("month", function(x,
+                                     label = FALSE,
+                                     abbr = TRUE,
+                                     locale = Sys.getlocale("LC_TIME")) {
+
+    if (call_binding("is.integer", x)) {
+      x <- call_binding("if_else",
+                        call_binding("between", x, 1, 12),
+                        x,
+                        NA_integer_)
+      if (!label) {
+        # if we don't need a label we can return the integer itself 
(constrained
+        # to 1:12)
+        return(x)
+      } else {
+        # make the integer into a date32() - which interprets integers as
+        # days from epoch (we multiply by 28 to be able to later extract the
+        # month with label) - NB this builds a false date (to be used by 
strftime)
+        # since we only know and care about the month
+        x <- build_expr("cast", x * 28L, options = cast_options(to_type = 
date32()))
+      }

Review comment:
       This is minor, and a matter of style so feel free to take it or leave 
it: I tend to do:
   
   ```
   if (condition) {
     return(...)
   }
   
   # code that happens next
   ```
   
   I find that makes the early return stand out a bit more, and saves us some 
indents on the rest of the larger function.
   
   Like I said, you can take it or leave it, that's very much personal 
preference.

##########
File path: r/tests/testthat/test-dplyr-funcs-datetime.R
##########
@@ -820,6 +816,80 @@ test_that("dst extracts daylight savings time correctly", {
   )
 })
 
+test_that("month() supports integer input", {
+    test_df_month <- tibble(
+      month_as_int = c(1:12, NA)
+    )
+
+    compare_dplyr_binding(
+      .input %>%
+        mutate(month_int_input = month(month_as_int)) %>%
+        collect(),
+      test_df_month
+    )
+
+    skip_on_os("windows") # https://issues.apache.org/jira/browse/ARROW-13168
+
+    compare_dplyr_binding(
+      .input %>%
+        # R returns ordered factor whereas Arrow returns character
+        mutate(
+          month_int_input = as.character(month(month_as_int, label = TRUE))
+        ) %>%
+        collect(),
+      test_df_month
+    )
+
+    compare_dplyr_binding(
+      .input %>%
+        # R returns ordered factor whereas Arrow returns character
+        mutate(
+          month_int_input = as.character(
+            month(month_as_int, label = TRUE, abbr = FALSE)
+          )
+        ) %>%
+        collect(),
+      test_df_month
+    )
+  })
+
+test_that("month() errors with double input and returns NA with int outside 
1:12", {
+  test_df_month <- tibble(
+    month_as_int = c(-1L, 1L, 13L, NA),
+    month_as_double = month_as_int + 0.1
+  )
+
+  expect_error(
+    test_df_month %>%
+      arrow_table() %>%
+      mutate(month_dbl_input = month(month_as_double)) %>%
+      collect(),
+    regexp = "Function 'month' has no kernel matching input types 
(array[double])",
+    fixed = TRUE
+  )
+
+  expect_error(
+    test_df_month %>%
+      record_batch() %>%
+      mutate(month_dbl_input = month(month_as_double)) %>%
+      collect(),
+    regexp = "Function 'month' has no kernel matching input types 
(array[double])",
+    fixed = TRUE
+  )
+
+  expect_equal(
+    test_df_month %>%
+      arrow_table() %>%
+      select(month_as_int) %>%
+      mutate(month_int_input = month(month_as_int)) %>%
+      collect(),
+    tibble(
+      month_as_int = c(-1L, 1L, 13L, NA),
+      month_int_input = c(NA, 1L, NA, NA)
+    )
+  )

Review comment:
       Again, a personal style thing, so take it or leave it, but I wonder if 
we should put this up at the top of this block (or possibly in the block above) 
so that you have the order {tests expected to work}, {tests expected to error}.
   
   When I first read this block I was like "Oh this is the errors block" — it's 
actually not (and you mention that in the title), but that's what I thought at 
first was surprised to see an `expect_equal()` here.
   
   Up to you if you want to take on that style as well.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] jonkeane commented on a change in pull request #12482: ARROW-15701 [R] month() should allow integer inputs

Reply via email to