bkietz commented on a change in pull request #10547:
URL: https://github.com/apache/arrow/pull/10547#discussion_r654573800
##########
File path: r/R/dplyr-functions.R
##########
@@ -215,6 +215,54 @@ nse_funcs$nchar <- function(x, type = "chars", allowNA =
FALSE, keepNA = NA) {
}
}
+nse_funcs$paste <- function(..., sep = " ", collapse = NULL, recycle0 = FALSE)
{
+ assert_that(is.null(collapse))
+ arrow_string_join_function(NullHandlingBehavior$REPLACE, "NA")(..., sep)
+}
+
+nse_funcs$paste0 <- function(..., collapse = NULL, recycle0 = FALSE) {
+ assert_that(is.null(collapse))
+ arrow_string_join_function(NullHandlingBehavior$REPLACE, "NA")(..., "")
+}
+
+nse_funcs$str_c <- function(..., sep = "", collapse = NULL) {
+ assert_that(is.null(collapse))
+ arrow_string_join_function(NullHandlingBehavior$EMIT_NULL)(..., sep)
+}
+
+arrow_string_join_function <- function(null_handling, null_replacement = NULL)
{
+ # the `binary_join_element_wise` Arrow C++ compute kernel takes the separator
+ # as the last argument, so pass `sep` as the last dots arg to this function
+ function(...) {
+ args <- lapply(list(...), function(arg) {
+ # handle scalar literal args, and cast all args to string for
+ # consistency with base::paste(), base::paste0(), and stringr::str_c()
+ if (!inherits(arg, "Expression")) {
+ assert_that(
+ length(arg) == 1,
+ msg = "Literal vectors of length != 1 not supported in string
concatenation"
+ )
+ # handle scalar literal NA consistent with the binary_join_element_wise
+ # kernel's handling of nulls in the data
+ if (null_handling == NullHandlingBehavior$REPLACE && is.na(arg)) {
+ arg <- null_replacement
+ }
Review comment:
This comment is incorrect:
```c++
- // This is redundant but expression simplification uses this to
potentially replace
- // calls with null
+ // binary_join_element_wise allocates its own null bitmap (and only
produces an
+ // intersection of input null bitmaps for JoinOptions::EMIT_NULL)
```
If we see an error in expression simplification, that should
be repaired in the expression simplification code. Kernels should never be
required
to make concessions for how expressions will consume them.
In this case the flag `kernel.null_handling = NullHandling::INTERSECTION;`
is not correct
for `binary_join_element_wise` since `JoinOptions::SKIP, REPLACE` don't
satisfy that
constraint. If a kernel does satisfy `INTERSECTION`, replacing calls with
nulls will never
be an error.
Sidebar: this is an example of why we might alter Function::Dispatch* to
examine FunctionOptions
as well as argument types. NullHandling::INTERSECTION is more efficient when
it's
available and we could break `binary_join_element_wise`'s kernels out into
one for each
`JoinOptions::NullHandlingBehavior`. This would reduce branching inside the
kernel as
well as giving the kernel for `JoinOptions::EMIT_NULL` a boost
##########
File path: r/R/dplyr-functions.R
##########
@@ -215,6 +215,54 @@ nse_funcs$nchar <- function(x, type = "chars", allowNA =
FALSE, keepNA = NA) {
}
}
+nse_funcs$paste <- function(..., sep = " ", collapse = NULL, recycle0 = FALSE)
{
+ assert_that(is.null(collapse))
+ arrow_string_join_function(NullHandlingBehavior$REPLACE, "NA")(..., sep)
+}
+
+nse_funcs$paste0 <- function(..., collapse = NULL, recycle0 = FALSE) {
+ assert_that(is.null(collapse))
+ arrow_string_join_function(NullHandlingBehavior$REPLACE, "NA")(..., "")
+}
+
+nse_funcs$str_c <- function(..., sep = "", collapse = NULL) {
+ assert_that(is.null(collapse))
+ arrow_string_join_function(NullHandlingBehavior$EMIT_NULL)(..., sep)
+}
+
+arrow_string_join_function <- function(null_handling, null_replacement = NULL)
{
+ # the `binary_join_element_wise` Arrow C++ compute kernel takes the separator
+ # as the last argument, so pass `sep` as the last dots arg to this function
+ function(...) {
+ args <- lapply(list(...), function(arg) {
+ # handle scalar literal args, and cast all args to string for
+ # consistency with base::paste(), base::paste0(), and stringr::str_c()
+ if (!inherits(arg, "Expression")) {
+ assert_that(
+ length(arg) == 1,
+ msg = "Literal vectors of length != 1 not supported in string
concatenation"
+ )
+ # handle scalar literal NA consistent with the binary_join_element_wise
+ # kernel's handling of nulls in the data
+ if (null_handling == NullHandlingBehavior$REPLACE && is.na(arg)) {
+ arg <- null_replacement
+ }
Review comment:
This comment is incorrect:
```c++
- // This is redundant but expression simplification uses this to
potentially replace
- // calls with null
+ // binary_join_element_wise allocates its own null bitmap (and only
produces an
+ // intersection of input null bitmaps for JoinOptions::EMIT_NULL)
```
If we see an error in expression simplification, that should be repaired in
the expression simplification code. Kernels should never be required to make
concessions for how expressions will consume them.
In this case the flag `kernel.null_handling = NullHandling::INTERSECTION;`
is not correct for `binary_join_element_wise` since `JoinOptions::SKIP,
REPLACE` don't satisfy that constraint. If a kernel does satisfy
`INTERSECTION`, replacing calls with nulls will never be an error.
Sidebar: this is an example of why we might alter Function::Dispatch* to
examine FunctionOptions as well as argument types. NullHandling::INTERSECTION
is more efficient when it's available and we could break
`binary_join_element_wise`'s kernels out into one for each
`JoinOptions::NullHandlingBehavior`. This would reduce branching inside the
kernel as well as giving the kernel for `JoinOptions::EMIT_NULL` a boost
##########
File path: r/R/dplyr-functions.R
##########
@@ -215,6 +215,54 @@ nse_funcs$nchar <- function(x, type = "chars", allowNA =
FALSE, keepNA = NA) {
}
}
+nse_funcs$paste <- function(..., sep = " ", collapse = NULL, recycle0 = FALSE)
{
+ assert_that(is.null(collapse))
+ arrow_string_join_function(NullHandlingBehavior$REPLACE, "NA")(..., sep)
+}
+
+nse_funcs$paste0 <- function(..., collapse = NULL, recycle0 = FALSE) {
+ assert_that(is.null(collapse))
+ arrow_string_join_function(NullHandlingBehavior$REPLACE, "NA")(..., "")
+}
+
+nse_funcs$str_c <- function(..., sep = "", collapse = NULL) {
+ assert_that(is.null(collapse))
+ arrow_string_join_function(NullHandlingBehavior$EMIT_NULL)(..., sep)
+}
+
+arrow_string_join_function <- function(null_handling, null_replacement = NULL)
{
+ # the `binary_join_element_wise` Arrow C++ compute kernel takes the separator
+ # as the last argument, so pass `sep` as the last dots arg to this function
+ function(...) {
+ args <- lapply(list(...), function(arg) {
+ # handle scalar literal args, and cast all args to string for
+ # consistency with base::paste(), base::paste0(), and stringr::str_c()
+ if (!inherits(arg, "Expression")) {
+ assert_that(
+ length(arg) == 1,
+ msg = "Literal vectors of length != 1 not supported in string
concatenation"
+ )
+ # handle scalar literal NA consistent with the binary_join_element_wise
+ # kernel's handling of nulls in the data
+ if (null_handling == NullHandlingBehavior$REPLACE && is.na(arg)) {
+ arg <- null_replacement
+ }
Review comment:
This comment is incorrect:
```diff
- // This is redundant but expression simplification uses this to
potentially replace
- // calls with null
+ // binary_join_element_wise allocates its own null bitmap (and only
produces an
+ // intersection of input null bitmaps for JoinOptions::EMIT_NULL)
```
If we see an error in expression simplification, that should be repaired in
the expression simplification code. Kernels should never be required to make
concessions for how expressions will consume them.
In this case the flag `kernel.null_handling = NullHandling::INTERSECTION;`
is not correct for `binary_join_element_wise` since `JoinOptions::SKIP,
REPLACE` don't satisfy that constraint. If a kernel does satisfy
`INTERSECTION`, replacing calls with nulls will never be an error.
Sidebar: this is an example of why we might alter Function::Dispatch* to
examine FunctionOptions as well as argument types. NullHandling::INTERSECTION
is more efficient when it's available and we could break
`binary_join_element_wise`'s kernels out into one for each
`JoinOptions::NullHandlingBehavior`. This would reduce branching inside the
kernel as well as giving the kernel for `JoinOptions::EMIT_NULL` a boost
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]