[
https://issues.apache.org/jira/browse/ARROW-11513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17309542#comment-17309542
]
Neal Richardson commented on ARROW-11513:
-----------------------------------------
Looking at the [options
struct|https://github.com/apache/arrow/pull/8468/files#diff-6bc7ecec6a4f7bcefc2511cde3bd809340ad0d94bb8f7cc5f4994063c798f2faR72-R83]
and the [re2 syntax|https://github.com/google/re2/wiki/Syntax], here are some
notes for how to map to R concepts:
* gsub/str_replace_all is -1 max_replacements (the default); sub/str_replace is
1 max_replacements
* fixed = FALSE (default) means to use the "replace_substring_regex" function;
fixed = TRUE means to use "replace_substring"
* if ignore.case = TRUE and fixed = FALSE, can wrap pattern with a flag like
{{paste0("(?i", pattern, ")")}} (or maybe it is actually {{paste0("(?i)",
pattern)}}, see [stringi
docs|https://stringi.gagolewski.com/rapi/stri_opts_regex.html]; unclear that we
have a case-insensitive, non-regex option
* stringr handles case insensitivity differently, using a stringi options
struct, so we may need to deal with that (or defer)
* useBytes: unclear that this is an option, or if it is relevant (per the docs
for {{sub}}, "The main effect of ‘useBytes = TRUE’ is to avoid errors/warnings
about invalid inputs and spurious matches in multibyte locales")
* perl: unclear that this is an option, or if it is relevant
> [R] Bindings for sub/gsub
> -------------------------
>
> Key: ARROW-11513
> URL: https://issues.apache.org/jira/browse/ARROW-11513
> Project: Apache Arrow
> Issue Type: New Feature
> Components: R
> Reporter: Neal Richardson
> Priority: Major
> Fix For: 4.0.0
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)