paleolimbot commented on code in PR #34708:
URL: https://github.com/apache/arrow/pull/34708#discussion_r1156336779
##########
r/R/io.R:
##########
@@ -239,6 +239,14 @@ make_readable_file <- function(file, mmap = TRUE) {
path <- sub("/$", "", file$base_path)
file <- filesystem$OpenInputFile(path)
} else if (is.string(file)) {
+
+ # if this is a HTTP URL, we need a local copy to pass to
FileSystem$from_uri
Review Comment:
It solves the problem in a way that will not work for the other type of file
we read that requires random access (Arrow non-stream files). It also does not
help with the related problem where sometimes we very explicitly do not want a
random access file (e.g., when reading an arrow stream over a socket) and the
existing code tries to make one and fails.
The root cause of both problems is that we do not pass all of our
requirements about what type of file we need (in this case, random access) to
`make_readable_file()`. I appreciate this solution solves the most common case
of this; however, it exchanges one problem (reading Parquet from a URL) with
increased indirection (we would now have code that makes a readable file that
is not contained within `make_readable_file()`). I've outlined a few ideas
about how one could go about this as part of this review...I am also happy to
more concretely prototype a solution.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]