[ 
https://issues.apache.org/jira/browse/ARROW-18250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17629822#comment-17629822
 ] 

Neal Richardson commented on ARROW-18250:
-----------------------------------------

ARROW-18202 is masking another issue with substring replacement with NA (which 
IMO doesn't really make sense even though stringr supports it; see also 
https://issues.apache.org/jira/browse/ARROW-18244?focusedCommentId=17628945&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17628945)

{code}
> data.frame(strs = c("one", "two", "three", "four")) |> 
> mutate(str_replace(strs, "o", NA_character_)) |> collect()
   strs str_replace(strs, "o", NA_character_)
1   one                                  <NA>
2   two                                  <NA>
3 three                                 three
4  four                                  <NA>

> arrow_table(strs = c("one", "two", "three", "four")) |> 
> mutate(str_replace(strs, "o", NA_character_)) |> collect()
# A tibble: 4 × 2
  strs  `str_replace(strs, "o", NA_character_)`
  <chr> <chr>                                  
1 one   NAne                                   
2 two   twNA                                   
3 three three                                  
4 four  fNAur 
{code}

Looks like if replacement is NA we should map to {{ifelse(grepl(pattern, x), 
NA, x)}} or something along those lines.

> [R][C++]  mutate(x2=x %>% str_replace('^ s*$',NA_character_)) Does not 
> replicate behaviour of R
> -----------------------------------------------------------------------------------------------
>
>                 Key: ARROW-18250
>                 URL: https://issues.apache.org/jira/browse/ARROW-18250
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, R
>            Reporter: Lucas Mation
>            Priority: Critical
>
> {code:r}
> q <- data.table(x=c('','1','2'))
> q %>% write_dataset('q')
> #in R
> q %>% mutate(x2=x %>% str_replace('^
> s*$',NA_character_))
>    x   x2
> 1:   <NA>
> 2: 1    1
> 3: 2    2
> #in arrow
> q2 <- 'q' %>% open_dataset %>% mutate(x2=x %>% str_replace('^
> s*$',NA_character_)) %>% collect
> q2
>    x x2
> 1:     
> 2: 1  1
> 3: 2  2
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to