amoeba commented on issue #45901: URL: https://github.com/apache/arrow/issues/45901#issuecomment-2790831738
Thanks @jimjam-slam. I see DuckDB's CSV reader has a `null_padding` option (see https://duckdb.org/docs/stable/data/csv/overview.html) and it might be possible to add something like that to arrow's CSV reader. A PR would be welcome. In the mean time, you could actually use DuckDB and pass the data to the arrow R package if you need something that only exists there, ```r library(arrow) library(duckdb) library(dplyr) library(dbplyr) con <- dbConnect(duckdb()) # duckdb R's read_csv function doesn't support null_padding so we use SQL to load the data dbExecute(con, "CREATE TABLE preferences AS SELECT * FROM read_csv(\"aec-senate-formalpreferences-27966-NSW.csv\", null_padding=true);") tbl(con, "preferences") |> to_arrow() |> group_by(State) |> summarize(n = n()) |> collect() ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org