nealrichardson commented on code in PR #13160:
URL: https://github.com/apache/arrow/pull/13160#discussion_r921504256
##########
r/tests/testthat/test-dplyr-filter.R:
##########
@@ -239,6 +239,14 @@ test_that("filter() with between()", {
filter(between(chr, 1, 2)) %>%
collect()
)
+
+ # with namespacing
+ expect_error(
Review Comment:
Should this assert the error message? As in, this test would pass on master
because it would error, but the error message would be different right?
##########
r/tests/testthat/test-dplyr-funcs-datetime.R:
##########
@@ -201,6 +212,14 @@ test_that("strftime", {
times
)
+ # with namespacing
Review Comment:
What if, instead of duplicating the whole test, you just added a column to
the mutate(), like:
```
compare_dplyr_binding(
.input %>%
mutate(
x = strftime(datetime, format = formats),
x2 = base::strftime(datetime, format = formats),
) %>%
collect(),
times
)
```
##########
r/tests/testthat/test-dplyr-funcs-conditional.R:
##########
@@ -192,6 +212,25 @@ test_that("case_when()", {
tbl
)
+ # with namespacing
+ compare_dplyr_binding(
+ .input %>%
+ transmute(cw = dplyr::case_when(chr %in% letters[1:3] ~ 1L) + 41L) %>%
+ collect(),
+ tbl
+ )
+
+ compare_dplyr_binding(
Review Comment:
Why do we need more than one test that you can call the namespaced version?
##########
r/vignettes/developers/bindings.Rmd:
##########
@@ -191,11 +191,11 @@ As `startsWith()` requires options, direct mapping is not
appropriate.
If the function cannot be mapped directly, some extra work may be needed to
ensure that calling the arrow version of the function results in the same
result
as calling the R version of the function. In this case, the function will
need
-adding to the `nse_funcs` list in `arrow/r/R/dplyr-functions.R`. Here is how
-this might look for `startsWith()`:
+adding to the `nse_funcs` function register. Here is how this might look for
Review Comment:
```suggestion
adding to the `nse_funcs` function registry. Here is how this might look
for
```
##########
r/tests/testthat/test-util.R:
##########
@@ -39,3 +39,34 @@ test_that("as_writable_table() errors for invalid input", {
# make sure other errors make it through
expect_snapshot_error(wrapper_fun(data.frame(x = I(list(1, "a")))))
})
+
+test_that("all_funs() identifies namespace-qualified and unqualified
functions", {
+ expect_equal(
+ arrow:::all_funs(rlang::quo(pkg::fun())),
Review Comment:
You don't need `arrow:::`, testthat runs in the package namespace
```suggestion
all_funs(rlang::quo(pkg::fun())),
```
##########
r/R/dplyr-funcs.R:
##########
@@ -58,15 +58,34 @@ NULL
#' @keywords internal
#'
register_binding <- function(fun_name, fun, registry = nse_funcs) {
- name <- gsub("^.*?::", "", fun_name)
- namespace <- gsub("::.*$", "", fun_name)
+ qualified_name <- fun_name
+ if (qualified_name == "::") {
+ unqualified_name <- "::"
+ } else {
+ unqualified_name <- gsub("^.*?::", "", qualified_name)
+ }
- previous_fun <- if (name %in% names(registry)) registry[[name]] else NULL
+ previous_fun <- if (unqualified_name %in% names(registry))
registry[[unqualified_name]] else NULL
Review Comment:
Isn't this just?
```suggestion
previous_fun <- registry[[unqualified_name]]
```
##########
r/R/dplyr-funcs.R:
##########
@@ -58,15 +58,34 @@ NULL
#' @keywords internal
#'
register_binding <- function(fun_name, fun, registry = nse_funcs) {
- name <- gsub("^.*?::", "", fun_name)
- namespace <- gsub("::.*$", "", fun_name)
+ qualified_name <- fun_name
Review Comment:
I think you can simplify this function if you just deal with `::`
separately. (I'll suggest below.) There's a lot of special casing that you
could drop.
##########
r/R/dplyr-funcs.R:
##########
@@ -116,3 +136,18 @@ create_binding_cache <- function() {
nse_funcs <- new.env(parent = emptyenv())
agg_funcs <- new.env(parent = emptyenv())
.cache <- new.env(parent = emptyenv())
+
+# we register 2 version of the "::" binding - one for use with nse_funcs
(below)
+# and another one for use with agg_funcs (in dplyr-summarize.R)
+register_bindings_utils <- function() {
+ register_binding("::", function(lhs, rhs) {
+ lhs_name <- as.character(substitute(lhs))
+ rhs_name <- as.character(substitute(rhs))
+
+ fun_name <- paste0(lhs_name, "::", rhs_name)
+
+ # if we do not have a binding for pkg::fun, then fall back on to the
+ # regular pkg::fun function
+ nse_funcs[[fun_name]] %||% asNamespace(lhs_name)[[rhs_name]]
+ })
Review Comment:
See above comment
```suggestion
nse_funcs[["::"] <- function(lhs, rhs) {
lhs_name <- as.character(substitute(lhs))
rhs_name <- as.character(substitute(rhs))
fun_name <- paste0(lhs_name, "::", rhs_name)
# if we do not have a binding for pkg::fun, then fall back on to the
# regular pkg::fun function
nse_funcs[[fun_name]] %||% asNamespace(lhs_name)[[rhs_name]]
}
```
##########
r/R/dplyr-funcs.R:
##########
@@ -58,15 +58,34 @@ NULL
#' @keywords internal
#'
register_binding <- function(fun_name, fun, registry = nse_funcs) {
- name <- gsub("^.*?::", "", fun_name)
- namespace <- gsub("::.*$", "", fun_name)
+ qualified_name <- fun_name
+ if (qualified_name == "::") {
+ unqualified_name <- "::"
+ } else {
+ unqualified_name <- gsub("^.*?::", "", qualified_name)
+ }
- previous_fun <- if (name %in% names(registry)) registry[[name]] else NULL
+ previous_fun <- if (unqualified_name %in% names(registry))
registry[[unqualified_name]] else NULL
+
+ # if th unqualified name exists in the register, warn
+ if (!is.null(fun) && !is.null(previous_fun)) {
+ warn(
+ paste0(
+ "A \"",
+ unqualified_name,
+ "\" binding already exists in the register and will be overwritten.")
+ )
+ }
+ # if fun is NULL remove entries from the function registry
Review Comment:
Is this a real use case? Do we test this?
##########
r/tests/testthat/test-dplyr-funcs-datetime.R:
##########
@@ -937,15 +1175,9 @@ test_that("date works in arrow", {
r_date_object <- lubridate::ymd_hms("2012-03-26 23:12:13")
- # we can't (for now) use namespacing, so we need to make sure
lubridate::date()
- # and not base::date() is being used. This is due to the way testthat runs
and
- # normal use of arrow would not have to do this explicitly.
- # TODO: remove after ARROW-14575
Review Comment:
🎉
##########
r/R/dplyr-summarize.R:
##########
@@ -348,7 +362,7 @@ summarize_eval <- function(name, quosure, ctx, hash) {
# the list output from the Arrow hash_tdigest kernel to flatten it into a
# column of type float64. We do that by modifying the unevaluated expression
# to replace quantile(...) with arrow_list_element(quantile(...), 0L)
- if (hash && "quantile" %in% funs_in_expr) {
+ if (hash && ("quantile" %in% funs_in_expr || "stats::quantile" %in%
funs_in_expr)) {
Review Comment:
```suggestion
if (hash && any(c("quantile", "stats::quantile") %in% funs_in_expr)) {
```
##########
r/R/dplyr-funcs.R:
##########
@@ -58,15 +58,34 @@ NULL
#' @keywords internal
#'
register_binding <- function(fun_name, fun, registry = nse_funcs) {
- name <- gsub("^.*?::", "", fun_name)
- namespace <- gsub("::.*$", "", fun_name)
+ qualified_name <- fun_name
+ if (qualified_name == "::") {
+ unqualified_name <- "::"
+ } else {
+ unqualified_name <- gsub("^.*?::", "", qualified_name)
+ }
- previous_fun <- if (name %in% names(registry)) registry[[name]] else NULL
+ previous_fun <- if (unqualified_name %in% names(registry))
registry[[unqualified_name]] else NULL
+
+ # if th unqualified name exists in the register, warn
Review Comment:
```suggestion
# if the unqualified name exists in the register, warn
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]