Zsolt Kegyes-Brassai created ARROW-16863:
--------------------------------------------

             Summary: [R] open_dataset() silently drops the missing values from 
a csv file
                 Key: ARROW-16863
                 URL: https://issues.apache.org/jira/browse/ARROW-16863
             Project: Apache Arrow
          Issue Type: Bug
            Reporter: Zsolt Kegyes-Brassai


The {{open_dataset()}} +silently+ drops the empty/missing values from a csv 
file. This empty string was generated when writing a dataframe containing a NA 
value using the {{{}write_csv_arrow(){}}}.

 
{code:java}
df_numbers <- tibble::tibble(number = c(1, 2, "error", 4, 5, NA, 7, 8))
arrow::write_csv_arrow(df_numbers, "numbers.csv")
readLines("numbers.csv")
#> [1] "\"number\"" "\"1\""      "\"2\""      "\"error\""  "\"4\""     
#> [6] "\"5\""      ""           "\"7\""      "\"8\""
arrow::open_dataset("numbers.csv", format = "csv") |> dplyr::collect()
#> # A tibble: 7 x 1
#>   number
#>   <chr> 
#> 1 1     
#> 2 2     
#> 3 error 
#> 4 4     
#> 5 5     
#> 6 7     
#> 7 8
{code}
 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to