thisisnic commented on a change in pull request #10624:
URL: https://github.com/apache/arrow/pull/10624#discussion_r662899545
##########
File path: r/R/expression.R
##########
@@ -32,6 +32,8 @@
"str_to_upper" = "utf8_upper",
"str_reverse" = "utf8_reverse",
# str_trim is defined in dplyr-functions.R
+ # str_sub is defined in dplyr-functions.R
+ # substr is defined in dplyr-functions.R
"year" = "year",
Review comment:
Should we mention `substring` here as well?
##########
File path: r/tests/testthat/test-dplyr-string-functions.R
##########
@@ -866,3 +866,57 @@ test_that("str_like", {
df
)
})
+
+test_that("substrings", {
+ df <- tibble(
+ x = "Apache Arrow"
+ )
+
+ expect_dplyr_equal(
+ input %>%
+ mutate(
+ foo = "Apache Arrow",
+ bar1 = substr(foo, 1, 6), # Apache
+ bar2 = substr(foo, 0, 6), # Apache
+ bar3 = substr(foo, -1, 6), # Apache
+ bar4 = substr(foo, 6, 1), # ""
+ bar5 = substr(foo, -1, -2), # ""
+ bar6 = substr(foo, 8, 12) # Arrow
Review comment:
No need to set up string "Apache Arrow" via variable `foo` as it's
already been set up as variable `x`.
```suggestion
bar1 = substr(x, 1, 6), # Apache
bar2 = substr(x, 0, 6), # Apache
bar3 = substr(x, -1, 6), # Apache
bar4 = substr(x, 6, 1), # ""
bar5 = substr(x, -1, -2), # ""
bar6 = substr(x, 8, 12) # Arrow
```
##########
File path: r/R/dplyr-functions.R
##########
@@ -280,6 +280,45 @@ nse_funcs$str_trim <- function(string, side = c("both",
"left", "right")) {
Expression$create(trim_fun, string)
}
+nse_funcs$substr <- function(string, start, stop) {
+ assert_that(
+ length(start) == 1,
+ msg = "Start of length != 1 not supported in Arrow"
+ )
+ assert_that(
+ length(end) == 1,
+ msg = "End of length != 1 not supported in Arrow"
+ )
+
+ if (start > stop) {
+ start <- 0
+ stop <- 0
+ } else {
+ start <- max(0, start - 1)
+ stop <- max(0, stop)
+ start_stop <- c(min(start, stop), max(start, stop))
+ start <- start_stop[1]
+ stop <- start_stop[2]
+ }
Review comment:
I'm a little confused here - please can you give me an example of values
that could be supplied as `start` and `stop` which would make it necessary to
use `min()` and `max()` here to put the values back in the correct order?
##########
File path: r/tests/testthat/test-dplyr-string-functions.R
##########
@@ -866,3 +866,57 @@ test_that("str_like", {
df
)
})
+
+test_that("substrings", {
+ df <- tibble(
+ x = "Apache Arrow"
+ )
+
+ expect_dplyr_equal(
+ input %>%
+ mutate(
+ foo = "Apache Arrow",
+ bar1 = substr(foo, 1, 6), # Apache
Review comment:
Please can you split these out into separate tests? It makes it easier
to follow and fix issues.
##########
File path: r/tests/testthat/test-dplyr-string-functions.R
##########
@@ -866,3 +866,57 @@ test_that("str_like", {
df
)
})
+
+test_that("substrings", {
+ df <- tibble(
+ x = "Apache Arrow"
+ )
+
+ expect_dplyr_equal(
+ input %>%
+ mutate(
+ foo = "Apache Arrow",
+ bar1 = substr(foo, 1, 6), # Apache
+ bar2 = substr(foo, 0, 6), # Apache
+ bar3 = substr(foo, -1, 6), # Apache
+ bar4 = substr(foo, 6, 1), # ""
+ bar5 = substr(foo, -1, -2), # ""
+ bar6 = substr(foo, 8, 12) # Arrow
+ ) %>%
+ select(bar1:bar6) %>%
Review comment:
```suggestion
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]