Carl Boettiger created ARROW-16619:
--------------------------------------

             Summary: read_csv_arrow / open_dataset over https connection?
                 Key: ARROW-16619
                 URL: https://issues.apache.org/jira/browse/ARROW-16619
             Project: Apache Arrow
          Issue Type: Bug
          Components: R
            Reporter: Carl Boettiger


Currently, remote access to data (particularly lazy read, an immensely powerful 
arrow ability) only works for data in an S3-compliant object store (though I 
know Azure support is in the works).  It would be really fantastic if we could 
have remote access over HTTPS (I think this already works on the python side 
thanks to fsspec).  

For example, this fails in arrow but works in readr:


arrow::read_csv_arrow("https://data.ecoforecast.org/targets/aquatics/aquatics-targets.csv.gz";)
 
readr::read_csv("https://data.ecoforecast.org/targets/aquatics/aquatics-targets.csv.gz";)

I think this ability would be even more compelling in `open_dataset()`, since 
it opens up for us all the power of lazy read access.  Most servers support 
curl range requests so it seems this should be possible.  (We can already do 
something similar from duckdb+R, but only after manually opting in the http 
extension and only for parquet).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to