bkietz commented on a change in pull request #10547:
URL: https://github.com/apache/arrow/pull/10547#discussion_r654573800



##########
File path: r/R/dplyr-functions.R
##########
@@ -215,6 +215,54 @@ nse_funcs$nchar <- function(x, type = "chars", allowNA = 
FALSE, keepNA = NA) {
   }
 }
 
+nse_funcs$paste <- function(..., sep = " ", collapse = NULL, recycle0 = FALSE) 
{
+  assert_that(is.null(collapse))
+  arrow_string_join_function(NullHandlingBehavior$REPLACE, "NA")(..., sep)
+}
+
+nse_funcs$paste0 <- function(..., collapse = NULL, recycle0 = FALSE) {
+  assert_that(is.null(collapse))
+  arrow_string_join_function(NullHandlingBehavior$REPLACE, "NA")(..., "")
+}
+
+nse_funcs$str_c <- function(..., sep = "", collapse = NULL) {
+  assert_that(is.null(collapse))
+  arrow_string_join_function(NullHandlingBehavior$EMIT_NULL)(..., sep)
+}
+
+arrow_string_join_function <- function(null_handling, null_replacement = NULL) 
{
+  # the `binary_join_element_wise` Arrow C++ compute kernel takes the separator
+  # as the last argument, so pass `sep` as the last dots arg to this function
+  function(...) {
+    args <- lapply(list(...), function(arg) {
+      # handle scalar literal args, and cast all args to string for
+      # consistency with base::paste(), base::paste0(), and stringr::str_c()
+      if (!inherits(arg, "Expression")) {
+        assert_that(
+          length(arg) == 1,
+          msg = "Literal vectors of length != 1 not supported in string 
concatenation"
+        )
+        # handle scalar literal NA consistent with the binary_join_element_wise
+        # kernel's handling of nulls in the data
+        if (null_handling == NullHandlingBehavior$REPLACE && is.na(arg)) {
+          arg <- null_replacement
+        }

Review comment:
       This comment is incorrect:
   ```c++
   -      // This is redundant but expression simplification uses this to 
potentially replace
   -      // calls with null
   +      // binary_join_element_wise allocates its own null bitmap (and only 
produces an
   +      // intersection of input null bitmaps for JoinOptions::EMIT_NULL)
   ```
   If we see an error in expression simplification, that should
   be repaired in the expression simplification code. Kernels should never be 
required
   to make concessions for how expressions will consume them.
   
   In this case the flag `kernel.null_handling = NullHandling::INTERSECTION;` 
is not correct
   for `binary_join_element_wise` since `JoinOptions::SKIP, REPLACE` don't 
satisfy that
   constraint. If a kernel does satisfy `INTERSECTION`, replacing calls with 
nulls will never
   be an error.
   
   Sidebar: this is an example of why we might alter Function::Dispatch* to 
examine FunctionOptions
   as well as argument types. NullHandling::INTERSECTION is more efficient when 
it's
   available and we could break `binary_join_element_wise`'s kernels out into 
one for each
   `JoinOptions::NullHandlingBehavior`. This would reduce branching inside the 
kernel as
   well as giving the kernel for `JoinOptions::EMIT_NULL` a boost
   
   

##########
File path: r/R/dplyr-functions.R
##########
@@ -215,6 +215,54 @@ nse_funcs$nchar <- function(x, type = "chars", allowNA = 
FALSE, keepNA = NA) {
   }
 }
 
+nse_funcs$paste <- function(..., sep = " ", collapse = NULL, recycle0 = FALSE) 
{
+  assert_that(is.null(collapse))
+  arrow_string_join_function(NullHandlingBehavior$REPLACE, "NA")(..., sep)
+}
+
+nse_funcs$paste0 <- function(..., collapse = NULL, recycle0 = FALSE) {
+  assert_that(is.null(collapse))
+  arrow_string_join_function(NullHandlingBehavior$REPLACE, "NA")(..., "")
+}
+
+nse_funcs$str_c <- function(..., sep = "", collapse = NULL) {
+  assert_that(is.null(collapse))
+  arrow_string_join_function(NullHandlingBehavior$EMIT_NULL)(..., sep)
+}
+
+arrow_string_join_function <- function(null_handling, null_replacement = NULL) 
{
+  # the `binary_join_element_wise` Arrow C++ compute kernel takes the separator
+  # as the last argument, so pass `sep` as the last dots arg to this function
+  function(...) {
+    args <- lapply(list(...), function(arg) {
+      # handle scalar literal args, and cast all args to string for
+      # consistency with base::paste(), base::paste0(), and stringr::str_c()
+      if (!inherits(arg, "Expression")) {
+        assert_that(
+          length(arg) == 1,
+          msg = "Literal vectors of length != 1 not supported in string 
concatenation"
+        )
+        # handle scalar literal NA consistent with the binary_join_element_wise
+        # kernel's handling of nulls in the data
+        if (null_handling == NullHandlingBehavior$REPLACE && is.na(arg)) {
+          arg <- null_replacement
+        }

Review comment:
       This comment is incorrect:
   ```c++
   -      // This is redundant but expression simplification uses this to 
potentially replace
   -      // calls with null
   +      // binary_join_element_wise allocates its own null bitmap (and only 
produces an
   +      // intersection of input null bitmaps for JoinOptions::EMIT_NULL)
   ```
   If we see an error in expression simplification, that should be repaired in 
the expression simplification code. Kernels should never be required to make 
concessions for how expressions will consume them.
   
   In this case the flag `kernel.null_handling = NullHandling::INTERSECTION;` 
is not correct for `binary_join_element_wise` since `JoinOptions::SKIP, 
REPLACE` don't satisfy that constraint. If a kernel does satisfy 
`INTERSECTION`, replacing calls with nulls will never be an error.
   
   Sidebar: this is an example of why we might alter Function::Dispatch* to 
examine FunctionOptions as well as argument types. NullHandling::INTERSECTION 
is more efficient when it's available and we could break 
`binary_join_element_wise`'s kernels out into one for each 
`JoinOptions::NullHandlingBehavior`. This would reduce branching inside the 
kernel as well as giving the kernel for `JoinOptions::EMIT_NULL` a boost

##########
File path: r/R/dplyr-functions.R
##########
@@ -215,6 +215,54 @@ nse_funcs$nchar <- function(x, type = "chars", allowNA = 
FALSE, keepNA = NA) {
   }
 }
 
+nse_funcs$paste <- function(..., sep = " ", collapse = NULL, recycle0 = FALSE) 
{
+  assert_that(is.null(collapse))
+  arrow_string_join_function(NullHandlingBehavior$REPLACE, "NA")(..., sep)
+}
+
+nse_funcs$paste0 <- function(..., collapse = NULL, recycle0 = FALSE) {
+  assert_that(is.null(collapse))
+  arrow_string_join_function(NullHandlingBehavior$REPLACE, "NA")(..., "")
+}
+
+nse_funcs$str_c <- function(..., sep = "", collapse = NULL) {
+  assert_that(is.null(collapse))
+  arrow_string_join_function(NullHandlingBehavior$EMIT_NULL)(..., sep)
+}
+
+arrow_string_join_function <- function(null_handling, null_replacement = NULL) 
{
+  # the `binary_join_element_wise` Arrow C++ compute kernel takes the separator
+  # as the last argument, so pass `sep` as the last dots arg to this function
+  function(...) {
+    args <- lapply(list(...), function(arg) {
+      # handle scalar literal args, and cast all args to string for
+      # consistency with base::paste(), base::paste0(), and stringr::str_c()
+      if (!inherits(arg, "Expression")) {
+        assert_that(
+          length(arg) == 1,
+          msg = "Literal vectors of length != 1 not supported in string 
concatenation"
+        )
+        # handle scalar literal NA consistent with the binary_join_element_wise
+        # kernel's handling of nulls in the data
+        if (null_handling == NullHandlingBehavior$REPLACE && is.na(arg)) {
+          arg <- null_replacement
+        }

Review comment:
       This comment is incorrect:
   ```diff
   -      // This is redundant but expression simplification uses this to 
potentially replace
   -      // calls with null
   +      // binary_join_element_wise allocates its own null bitmap (and only 
produces an
   +      // intersection of input null bitmaps for JoinOptions::EMIT_NULL)
   ```
   If we see an error in expression simplification, that should be repaired in 
the expression simplification code. Kernels should never be required to make 
concessions for how expressions will consume them.
   
   In this case the flag `kernel.null_handling = NullHandling::INTERSECTION;` 
is not correct for `binary_join_element_wise` since `JoinOptions::SKIP, 
REPLACE` don't satisfy that constraint. If a kernel does satisfy 
`INTERSECTION`, replacing calls with nulls will never be an error.
   
   Sidebar: this is an example of why we might alter Function::Dispatch* to 
examine FunctionOptions as well as argument types. NullHandling::INTERSECTION 
is more efficient when it's available and we could break 
`binary_join_element_wise`'s kernels out into one for each 
`JoinOptions::NullHandlingBehavior`. This would reduce branching inside the 
kernel as well as giving the kernel for `JoinOptions::EMIT_NULL` a boost




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to