jonkeane commented on a change in pull request #11232:
URL: https://github.com/apache/arrow/pull/11232#discussion_r718651282
##########
File path: r/R/dplyr-functions.R
##########
@@ -330,6 +330,39 @@ arrow_string_join_function <- function(null_handling,
null_replacement = NULL) {
}
}
+# Currently, Arrow does not supports a locale option for string case conversion
+# functions, contrast to stringr's API, so the 'locale' argument is only valid
+# for stringr's default value ("en"). The following are string functions that
+# take a 'locale' option as its second argument:
+# str_to_lower
+# str_to_upper
+# str_to_title
+#
+# Arrow locale will be supported with ARROW-14126
+nse_funcs$str_to_lower <- function(string, locale = "en") {
+ if (!identical(locale, "en")) {
+ stop("Providing 'locale' to 'str_to_lower' is not supported in Arrow; ",
+ "to change locale use 'Sys.setlocale()'", call. = FALSE)
Review comment:
I like Nic's phrasing here, seeing it repeated three times: should we
write/store the message once and use it in each `stop` so we know the same is
being used. If we do want to keep the function name in it (I don't think we
need to, but if we do) we could make a small helper that plots that in.
##########
File path: r/tests/testthat/test-dplyr-string-functions.R
##########
@@ -471,6 +471,46 @@ test_that("strsplit and str_split", {
)
})
+test_that("str_to_lower, str_to_upper, and str_to_title", {
+ df <- tibble(x = c("Foo", " B\na R", "ⱭɽⱤoW", "ıI"))
Review comment:
I've triggered our as-cran build to check this — we might want (or in
fact need) to turn these non-ascii characters into their unicode escapes to not
anger cran checking (cf
https://github.com/apache/arrow/blob/master/r/tests/testthat/test-json.R#L26)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]