[ 
https://issues.apache.org/jira/browse/ARROW-18241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17628622#comment-17628622
 ] 

Neal Richardson commented on ARROW-18241:
-----------------------------------------

Two observations:

1. This isn't just about empty strings: cast string to int raises an error on 
any string that doesn't parse. I believe this was raised before but I can't 
seem to find an issue about it (that is, adding an option to return NA for 
values that don't parse instead of erroring). I agree this would be a nice 
option to have.

{code}
> arrow_table(x="a") %>% mutate(x = as.integer(x)) %>% collect()
Error in `compute.arrow_dplyr_query()`:
! Invalid: Failed to parse string: 'a' as a scalar of type int32
{code}

2. The ifelse workaround will work, and it should work as you typed it on the 
development version of the package. On the released version, you can make it 
work by explicitly making the NA be a string so the types match:

{code}
arrow_table(a=c("1", "", "3")) %>% 
  mutate(x = as.integer(ifelse(a == "", NA_character_, a))) %>% 
  collect()

# A tibble: 3 × 2
  a         x
  <chr> <int>
1 "1"       1
2 ""       NA
3 "3"       3
{code}


> [R] as.integer can't handdle empty character cels (ex c(''))
> ------------------------------------------------------------
>
>                 Key: ARROW-18241
>                 URL: https://issues.apache.org/jira/browse/ARROW-18241
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: R
>            Reporter: Lucas Mation
>            Priority: Major
>
> I am importing a dataset with arrow, and then converting variable types. But 
> I got an error message because the `arrow` implementation of `as.integer` 
> can't handle empty strings (which is legal in base R). Is this a bug?
> {code:r}
> #In R
> '' %>% as.integer()
> [1] NA
>  
> #in arrow
> q <- data.table(x=c('','1','2'))
> q %>% write_dataset('q')
> q2 <- 'q' %>% open_dataset %>% mutate(x=as.integer(x)) %>% collect
> Error in `collect()`:
> ! Invalid: Failed to parse string: '' as a scalar of type int32
> Run `rlang::last_error()` to see where the error occurred.
> {code}
> Update: tryed to preprocess x with `ifelse` but it also did not work.
> {code:r}
> paste0(p2,'/q') %>% open_dataset %>% mutate(x= ifelse(x=='',NA,x)) %>% 
> mutate(x=as.integer(x)) %>% collect
> Error in `collect()`:
> ! NotImplemented: Function 'if_else' has no kernel matching input types 
> (bool, bool, string)
> Run `rlang::last_error()` to see where the error occurred.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to