Carl Boettiger created ARROW-14998:
--------------------------------------
Summary: Support for HTTPS Filesystem access (from R client)
Key: ARROW-14998
URL: https://issues.apache.org/jira/browse/ARROW-14998
Project: Apache Arrow
Issue Type: Wish
Components: R
Reporter: Carl Boettiger
Thanks for such an amazing project. I've been entirely blown away by the S3
Filesystem access in the latest release; and excited to see other backends like
Azure being discussed in the issues. As you know, many https clients also
permit range requests, meaning (I think) that it should be possible to access
public data (parquet, csv files) over generic HTTPS connections too.
As you probably know, duckdb already has support for https based remote file
access, e.g.
[https://github.com/duckdb/duckdb/blob/master/test/sql/copy/parquet/test_parquet_remote.test|https://github.com/duckdb/duckdb/blob/master/test/sql/copy/parquet/test_parquet_remote.test.]
(though it is not available out-of-the-box in the R client there either).
It would be wonderful to have a similar remote filesystem access that could
work over HTTPS like that in arrow. (I gather on the python side, fsspec
already gives access to a wide number of such abstractions, but we're more
limited in R so far, except for the geospatial data, where bindings to GDAL
mean we can access GDAL's rather amazing virtual file systems over https, S3,
FTP, etc, [https://gdal.org/user/virtual_file_systems.html] – a nice array-data
complement to the more database-oriented workflow of arrow...).
Thanks for considering!
--
This message was sent by Atlassian Jira
(v8.20.1#820001)