boshek opened a new pull request, #13183:
URL: https://github.com/apache/arrow/pull/13183

   This PR enables reading/writing compressed data streams over s3 and locally 
and adds some tests to make some of those round trips. For the filesystem path 
I had to do a little regex on the string for compression detection but any 
feedback on alternative approaches is very welcome. Previously supplying a file 
with a compression extension wrote out an uncompressed file:
   
   ```r
   library(arrow, warn.conflicts = FALSE)
   ## local
   write_csv_arrow(mtcars, file = file)
   write_csv_arrow(mtcars, file = comp_file)
   file.size(file)
   [1] 1303
   file.size(comp_file)
   [1] 567
   
   ## or with s3
   dir <- tempfile()
   dir.create(dir)
   subdir <- file.path(dir, "bucket")
   dir.create(subdir)
   
   minio_server <- processx::process$new("minio", args = c("server", dir), 
supervise = TRUE)
   Sys.sleep(2)
   stopifnot(minio_server$is_alive())
   
   s3_uri <- 
"s3://minioadmin:minioadmin@?scheme=http&endpoint_override=localhost%3A9000"
   bucket <- s3_bucket(s3_uri)
   
   write_csv_arrow(mtcars, bucket$path("bucket/data.csv.gz"))
   write_csv_arrow(mtcars, bucket$path("bucket/data.csv"))
   
   file.size(file.path(subdir, "data.csv.gz"))
   [1] 567
   file.size(file.path(subdir, "data.csv"))
   [1] 1303
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to