[
https://issues.apache.org/jira/browse/ARROW-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541497#comment-17541497
]
Dewey Dunnington commented on ARROW-16619:
------------------------------------------
I believe it will also work out-of-the-box for an IPC stream (or anything else
that doesn't require a RandomAccessFile)!
I think it's within scope for the R package to support a URL...I run into this
primarily when writing about Arrow, where I want to write something like
{{read_parquet("https://my.website/some_tiny.parquet")}} and instead have to
download to a temporary file myself which takes some vertical writing space
that I'd like to avoid for clarity.
Pre-vroom readr supported this via a "download to temporary file + read +
delete temporary file on exit" strategy, which seems reasonable here as well.
I'd be happy to implement that unless there are objections.
> [R] Support compression + R connection (URL with .gz file)
> ----------------------------------------------------------
>
> Key: ARROW-16619
> URL: https://issues.apache.org/jira/browse/ARROW-16619
> Project: Apache Arrow
> Issue Type: Bug
> Components: R
> Reporter: Carl Boettiger
> Priority: Major
>
> Currently, remote access to data (particularly lazy read, an immensely
> powerful arrow ability) only works for data in an S3-compliant object store
> (though I know Azure support is in the works). It would be really fantastic
> if we could have remote access over HTTPS (I think this already works on the
> python side thanks to fsspec).
> For example, this fails in arrow but works in readr:
> arrow::read_csv_arrow("https://data.ecoforecast.org/targets/aquatics/aquatics-targets.csv.gz")
>
> readr::read_csv("https://data.ecoforecast.org/targets/aquatics/aquatics-targets.csv.gz")
> I think this ability would be even more compelling in `open_dataset()`, since
> it opens up for us all the power of lazy read access. Most servers support
> curl range requests so it seems this should be possible. (We can already do
> something similar from duckdb+R, but only after manually opting in the http
> extension and only for parquet).
--
This message was sent by Atlassian Jira
(v8.20.7#820007)