jonkeane commented on a change in pull request #12429:
URL: https://github.com/apache/arrow/pull/12429#discussion_r810506760



##########
File path: r/R/dplyr-funcs-datetime.R
##########
@@ -115,6 +115,23 @@ register_bindings_datetime <- function() {
       return(Expression$create("strftime", x, options = list(format = format, 
locale = locale)))
     }
 
+    if (call_binding("is.integer", x)) {
+      if (inherits(x, "Expression")) {
+        call_binding(
+          "if_else",
+          call_binding_agg("all", call_binding("between", x, 1, 12))$data,
+          x,
+          abort("bla: Values are not in 1:12")
+        )

Review comment:
       I suspect in this part of the branch what you'll need to do is (re) 
assign the `call_binding` to `x` or return it directly. (thought see the 
comment below that this probably needs to happen above or we'll need to take 
into account that `label` argument.

##########
File path: r/R/dplyr-funcs-datetime.R
##########
@@ -115,6 +115,23 @@ register_bindings_datetime <- function() {
       return(Expression$create("strftime", x, options = list(format = format, 
locale = locale)))
     }
 
+    if (call_binding("is.integer", x)) {
+      if (inherits(x, "Expression")) {
+        call_binding(
+          "if_else",
+          call_binding_agg("all", call_binding("between", x, 1, 12))$data,
+          x,
+          abort("bla: Values are not in 1:12")
+        )
+      } else {
+        if (all(1 <= x & x <= 12)) {
+          return(x)
+        } else {
+          abort("bla2: Values are not in 1:12")
+        }
+      }
+    }

Review comment:
       Thanks for this, this is on the right track. I suspect that we might 
want this to be a bit higher in the function call, since we can use numerics 
and labels in lubridate:
   
   ```
   lubridate::month(2, label = TRUE)
   [1] Feb
   12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec
   ```
   
   Could you also add some tests for `month()` taking various inputs here?
   
   A few inputs that I would start with (though I'm sure there are a bunch that 
could get us into funny corners!):
   
   ```
   month(1)
   month(1.1)
   month(4L)
   month(4L, label = TRUE)
   ```
   
   You might have already figured this out (in which case ignore this!), but an 
understanding of this will help you work on these bindings:
   
   We have a few of these helper functions in the package:
   
   * `call_function()` — this is exported and is a way to evaluate a specific 
arrow C++ compute function. This is helpful for answering questions like "Does 
this C++ compute function support this input? What does the error look like 
when it doesn?"
   * `call_binding()` — (which is unexpected) makes it easy to call one of our 
dplyr bindings, but note that `call_binding()` creates an Arrow Expression, but 
doesn't actually evaluate it (since evaluating it might be expensive, that is 
deferred to when the query is actually running). This is how we construct 
binding calls that rely on other bindings. Running this standalone will _not_ 
evaluate on the data, but running it alone is good for catching type-checking 
errors.
   * `arrow_eval()` — this will trigger actual evaluation (and is used inside 
of our dplyr evaluation code)
   
   So when you're experimenting you might call something like 
`arrow_eval(call_binding("month", Expression$scalar(1L)), arrow_mask(list()))` 
to see what happens if we were to call and evaluate our `month` dplyr binding. 
The `arrow_mask(list())` at the end is because `arrow_eval()` is a tidy-eval 
style that one needs to provide a data mask.
   
   One last thing: for this binding, since the behavior depends (at least in 
some cases) on the actual data in the column: we might want to make a (test) 
helper that wraps `arrow_eval(..., arrow_mask(list()))` that makes it easier to 
push various kinds of data through one of our dplyr bindings and see the output 
(either the values or the errors). It's possible something like this exists 
(honestly, in some ways a dplyr pipeline kind of _is_ that wrapper — just with 
a lot more) but I'm not aware of a quick helper that makes constructing + 
triggering the bindings with various inputs easy.

##########
File path: r/tests/testthat/test-dplyr-funcs-datetime.R
##########
@@ -711,3 +711,40 @@ test_that("am/pm mirror lubridate", {
   )
 
 })
+
+test_that("semester works with temporal types and integers", {
+  test_df <- tibble(
+    month_as_int = c(1:12, NA),
+    month_as_char_pad = sprintf("%02i", month_as_int),
+    dates = as.Date(paste0("2021-", month_as_char_pad, "-15"))
+  )
+
+  # semester extraction from dates
+  compare_dplyr_binding(
+     .input %>%
+      mutate(sem_wo_year = semester(dates),
+             sem_w_year = semester(dates, with_year = TRUE)) %>%
+      collect(),
+     test_df
+  )
+  # semester extraction from months as integers
+  compare_dplyr_binding(
+    .input %>%
+      mutate(sem_month_as_int = semester(month_as_int)) %>%
+      collect(),
+    test_df
+  )
+
+  expect_error(
+    call_binding("semester", Expression$scalar(1:13))
+  )

Review comment:
       Could you assert the error that you're getting here?

##########
File path: r/R/dplyr-funcs-datetime.R
##########
@@ -115,6 +115,23 @@ register_bindings_datetime <- function() {
       return(Expression$create("strftime", x, options = list(format = format, 
locale = locale)))
     }
 
+    if (call_binding("is.integer", x)) {
+      if (inherits(x, "Expression")) {
+        call_binding(
+          "if_else",
+          call_binding_agg("all", call_binding("between", x, 1, 12))$data,

Review comment:
       I suspect that the `$data` isn't doing what you expect here. That's 
actually getting the _input_ to the `all` binding (not the result).
   
   We also should be careful that we don't want to go through the data more 
than once — is there a way we can construct a series of binding calls like this 
that get us to the answer (and then error if they run into a value that won't 
work?)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to