Carl Boettiger created ARROW-16619:
--------------------------------------
Summary: read_csv_arrow / open_dataset over https connection?
Key: ARROW-16619
URL: https://issues.apache.org/jira/browse/ARROW-16619
Project: Apache Arrow
Issue Type: Bug
Components: R
Reporter: Carl Boettiger
Currently, remote access to data (particularly lazy read, an immensely powerful
arrow ability) only works for data in an S3-compliant object store (though I
know Azure support is in the works). It would be really fantastic if we could
have remote access over HTTPS (I think this already works on the python side
thanks to fsspec).
For example, this fails in arrow but works in readr:
arrow::read_csv_arrow("https://data.ecoforecast.org/targets/aquatics/aquatics-targets.csv.gz")
readr::read_csv("https://data.ecoforecast.org/targets/aquatics/aquatics-targets.csv.gz")
I think this ability would be even more compelling in `open_dataset()`, since
it opens up for us all the power of lazy read access. Most servers support
curl range requests so it seems this should be possible. (We can already do
something similar from duckdb+R, but only after manually opting in the http
extension and only for parquet).
--
This message was sent by Atlassian Jira
(v8.20.7#820007)