[ 
https://issues.apache.org/jira/browse/ARROW-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17542229#comment-17542229
 ] 

Carl Boettiger commented on ARROW-16619:
----------------------------------------

sounds good!  Eventually it would still be nice for this to work as a stream 
from https source, e.g. with open_dataset(), allowing us to 
filter-before-downloading or 'read once' into RAM rather than than serializing 
to disk first.... But going the temp file route for now would still be a very 
nice improvement

> [R] Support compression + R connection (URL with .gz file)
> ----------------------------------------------------------
>
>                 Key: ARROW-16619
>                 URL: https://issues.apache.org/jira/browse/ARROW-16619
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: R
>            Reporter: Carl Boettiger
>            Priority: Major
>
> Currently, remote access to data (particularly lazy read, an immensely 
> powerful arrow ability) only works for data in an S3-compliant object store 
> (though I know Azure support is in the works).  It would be really fantastic 
> if we could have remote access over HTTPS (I think this already works on the 
> python side thanks to fsspec).  
> For example, this fails in arrow but works in readr:
> arrow::read_csv_arrow("https://data.ecoforecast.org/targets/aquatics/aquatics-targets.csv.gz";)
>  
> readr::read_csv("https://data.ecoforecast.org/targets/aquatics/aquatics-targets.csv.gz";)
> I think this ability would be even more compelling in `open_dataset()`, since 
> it opens up for us all the power of lazy read access.  Most servers support 
> curl range requests so it seems this should be possible.  (We can already do 
> something similar from duckdb+R, but only after manually opting in the http 
> extension and only for parquet).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to