boshek opened a new pull request, #13183:
URL: https://github.com/apache/arrow/pull/13183
This PR enables reading/writing compressed data streams over s3 and locally
and adds some tests to make some of those round trips. For the filesystem path
I had to do a little regex on the string for compression detection but any
feedback on alternative approaches is very welcome. Previously supplying a file
with a compression extension wrote out an uncompressed file:
```r
library(arrow, warn.conflicts = FALSE)
## local
write_csv_arrow(mtcars, file = file)
write_csv_arrow(mtcars, file = comp_file)
file.size(file)
[1] 1303
file.size(comp_file)
[1] 567
## or with s3
dir <- tempfile()
dir.create(dir)
subdir <- file.path(dir, "bucket")
dir.create(subdir)
minio_server <- processx::process$new("minio", args = c("server", dir),
supervise = TRUE)
Sys.sleep(2)
stopifnot(minio_server$is_alive())
s3_uri <-
"s3://minioadmin:minioadmin@?scheme=http&endpoint_override=localhost%3A9000"
bucket <- s3_bucket(s3_uri)
write_csv_arrow(mtcars, bucket$path("bucket/data.csv.gz"))
write_csv_arrow(mtcars, bucket$path("bucket/data.csv"))
file.size(file.path(subdir, "data.csv.gz"))
[1] 567
file.size(file.path(subdir, "data.csv"))
[1] 1303
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]