[
https://issues.apache.org/jira/browse/ARROW-18250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17629822#comment-17629822
]
Neal Richardson commented on ARROW-18250:
-----------------------------------------
ARROW-18202 is masking another issue with substring replacement with NA (which
IMO doesn't really make sense even though stringr supports it; see also
https://issues.apache.org/jira/browse/ARROW-18244?focusedCommentId=17628945&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17628945)
{code}
> data.frame(strs = c("one", "two", "three", "four")) |>
> mutate(str_replace(strs, "o", NA_character_)) |> collect()
strs str_replace(strs, "o", NA_character_)
1 one <NA>
2 two <NA>
3 three three
4 four <NA>
> arrow_table(strs = c("one", "two", "three", "four")) |>
> mutate(str_replace(strs, "o", NA_character_)) |> collect()
# A tibble: 4 × 2
strs `str_replace(strs, "o", NA_character_)`
<chr> <chr>
1 one NAne
2 two twNA
3 three three
4 four fNAur
{code}
Looks like if replacement is NA we should map to {{ifelse(grepl(pattern, x),
NA, x)}} or something along those lines.
> [R][C++] mutate(x2=x %>% str_replace('^ s*$',NA_character_)) Does not
> replicate behaviour of R
> -----------------------------------------------------------------------------------------------
>
> Key: ARROW-18250
> URL: https://issues.apache.org/jira/browse/ARROW-18250
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++, R
> Reporter: Lucas Mation
> Priority: Critical
>
> {code:r}
> q <- data.table(x=c('','1','2'))
> q %>% write_dataset('q')
> #in R
> q %>% mutate(x2=x %>% str_replace('^
> s*$',NA_character_))
> x x2
> 1: <NA>
> 2: 1 1
> 3: 2 2
> #in arrow
> q2 <- 'q' %>% open_dataset %>% mutate(x2=x %>% str_replace('^
> s*$',NA_character_)) %>% collect
> q2
> x x2
> 1:
> 2: 1 1
> 3: 2 2
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)