nealrichardson commented on a change in pull request #11232:
URL: https://github.com/apache/arrow/pull/11232#discussion_r718770517
##########
File path: r/tests/testthat/test-dplyr-string-functions.R
##########
@@ -471,6 +471,46 @@ test_that("strsplit and str_split", {
)
})
+test_that("str_to_lower, str_to_upper, and str_to_title", {
+ df <- tibble(x = c("Foo", " B\na R", "ⱭɽⱤoW", "ıI"))
Review comment:
I believe it was Solaris that would choke on this in the past.
Can we just rely on the C++ tests that unicode etc. works? We just need to
test here that we can invoke the kernels correctly.
##########
File path: r/R/dplyr-functions.R
##########
@@ -330,6 +330,35 @@ arrow_string_join_function <- function(null_handling,
null_replacement = NULL) {
}
}
+# Currently, Arrow does not supports a locale option for string case conversion
+# functions, contrast to stringr's API, so the 'locale' argument is only valid
+# for stringr's default value ("en"). The following are string functions that
+# take a 'locale' option as its second argument:
+# str_to_lower
+# str_to_upper
+# str_to_title
+#
+# Arrow locale will be supported with ARROW-14126
+.arrow_string_function_with_locale_arg <- function(func, string, locale) {
+ if (!identical(locale, "en")) {
+ stop("Providing a value for 'locale' other than the default ('en') is not
supported by Arrow. ",
+ "To change locale, use 'Sys.setlocale()'", call. = FALSE)
+ }
+ Expression$create(func, string)
+}
+
+nse_funcs$str_to_lower <- function(string, locale = "en") {
+ .arrow_string_function_with_locale_arg("utf8_lower", string, locale)
+}
+
+nse_funcs$str_to_upper <- function(string, locale = "en") {
+ .arrow_string_function_with_locale_arg("utf8_upper", string, locale)
+}
+
+nse_funcs$str_to_title <- function(string, locale = "en") {
+ .arrow_string_function_with_locale_arg("utf8_title", string, locale)
+}
Review comment:
Sorry to bikeshed but I find this more readable, what do you think
@jonkeane?
```suggestion
stop_if_locale_provided <- function(locale) {
if (!identical(locale, "en")) {
stop("Providing a value for 'locale' other than the default ('en') is
not supported by Arrow. ",
"To change locale, use 'Sys.setlocale()'", call. = FALSE)
}
}
nse_funcs$str_to_lower <- function(string, locale = "en") {
stop_if_locale_provided(locale)
Expression$create("utf8_lower", string)
}
nse_funcs$str_to_upper <- function(string, locale = "en") {
stop_if_locale_provided(locale)
Expression$create("utf8_upper", string)
}
nse_funcs$str_to_title <- function(string, locale = "en") {
stop_if_locale_provided(locale)
Expression$create("utf8_title", string)
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]