paleolimbot commented on code in PR #34708:
URL: https://github.com/apache/arrow/pull/34708#discussion_r1156336779


##########
r/R/io.R:
##########
@@ -239,6 +239,14 @@ make_readable_file <- function(file, mmap = TRUE) {
     path <- sub("/$", "", file$base_path)
     file <- filesystem$OpenInputFile(path)
   } else if (is.string(file)) {
+
+    # if this is a HTTP URL, we need a local copy to pass to 
FileSystem$from_uri

Review Comment:
   It solves the problem in a way that will not work for the other type of file 
we read that requires random access (Arrow non-stream files). It also does not 
help with the related problem where sometimes we very explicitly do not want a 
random access file (e.g., when reading an arrow stream over a socket) and the 
existing code tries to make one and fails.
   
   The root cause of both problems is that we do not pass all of our 
requirements about what type of file we need (in this case, random access) to 
`make_readable_file()`. I appreciate this solution solves the most common case 
of this; however, it exchanges one problem (reading Parquet from a URL) with 
increased indirection (we would now have code that makes a readable file that 
is not contained within `make_readable_file()`). I've outlined a few ideas 
about how one could go about this as part of this review...I am also happy to 
more concretely prototype a solution.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to