[GitHub] [arrow] paleolimbot commented on a diff in pull request #34708: GH-33287: [R] Cannot read_parquet on http URL

via GitHub Mon, 27 Mar 2023 06:24:50 -0700


paleolimbot commented on code in PR #34708:
URL: https://github.com/apache/arrow/pull/34708#discussion_r1149234022



##########
r/R/io.R:
##########
@@ -239,6 +239,14 @@ make_readable_file <- function(file, mmap = TRUE) {
     path <- sub("/$", "", file$base_path)
     file <- filesystem$OpenInputFile(path)
   } else if (is.string(file)) {
+
+    # if this is a HTTP URL, we need a local copy to pass to 
FileSystem$from_uri
+    if (is_http_url(file)) {
+        tf <- tempfile()
+        download.file(file, tf)

Review Comment:
   To very specifically answer that question:
   
   ``` r
   library(arrow, warn.conflicts = FALSE)
   
   copy_input_stream <- function(input_stream, sink, chunk_size = 2 ^ 20) {
     output_stream <- arrow:::make_output_stream(sink)
     on.exit(output_stream$close())
     while ((chunk <- input_stream$Read(chunk_size))$size > 0) {
       output_stream$write(chunk)
     }
   }
   
   addr <- "http://httpbin.org/base64/SFRUUEJJTiBpcyBhd2Vzb21l";
   input_stream <- arrow:::make_readable_file(addr)
   
   tmp <- tempfile()
   copy_input_stream(input_stream, tmp)
   readr::read_file(tmp)
   #> [1] "HTTPBIN is awesome"
   readr::read_file(url(addr))
   #> [1] "HTTPBIN is awesome"
   ```
   
   ...however, that example still uses the same infrastructure that R uses with 
`download.file()`, so probably just use `download.file()` for now (but maybe 
with `quiet = TRUE`).
   
   A better reason to "use arrow's infrastructure" would be to use the version 
of CURL + InputStream infrastructure that's available for consistency's sake, 
but not a battle for this PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] paleolimbot commented on a diff in pull request #34708: GH-33287: [R] Cannot read_parquet on http URL

Reply via email to