amoeba commented on issue #45901:
URL: https://github.com/apache/arrow/issues/45901#issuecomment-2790831738

   Thanks @jimjam-slam. I see DuckDB's CSV reader has a `null_padding` option 
(see https://duckdb.org/docs/stable/data/csv/overview.html) and it might be 
possible to add something like that to arrow's CSV reader. A PR would be 
welcome.
   
   In the mean time, you could actually use DuckDB and pass the data to the 
arrow R package if you need something that only exists there,
   
   ```r
   library(arrow)
   library(duckdb)
   library(dplyr)
   library(dbplyr)
   
   con <- dbConnect(duckdb())
   
   # duckdb R's read_csv function doesn't support null_padding so we use SQL to 
load the data
   dbExecute(con, "CREATE TABLE preferences AS SELECT * FROM 
read_csv(\"aec-senate-formalpreferences-27966-NSW.csv\", null_padding=true);")
   
   tbl(con, "preferences") |> 
     to_arrow() |> 
     group_by(State) |> 
     summarize(n = n()) |>
     collect()
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to