wjones127 commented on code in PR #13601:
URL: https://github.com/apache/arrow/pull/13601#discussion_r927152972


##########
r/vignettes/fs.Rmd:
##########
@@ -1,56 +1,90 @@
 ---
-title: "Working with Cloud Storage (S3)"
+title: "Working with Cloud Storage (S3, GCS)"
 output: rmarkdown::html_vignette
 vignette: >
-  %\VignetteIndexEntry{Working with Cloud Storage (S3)}
+  %\VignetteIndexEntry{Working with Cloud Storage (S3, GCS)}
   %\VignetteEngine{knitr::rmarkdown}
   %\VignetteEncoding{UTF-8}
 ---
 
 The Arrow C++ library includes a generic filesystem interface and specific
 implementations for some cloud storage systems. This setup allows various
 parts of the project to be able to read and write data with different storage
-backends. In the `arrow` R package, support has been enabled for AWS S3.
-This vignette provides an overview of working with S3 data using Arrow.
+backends. In the `arrow` R package, support has been enabled for AWS S3 and
+Google Cloud Storage (GCS). This vignette provides an overview of working with 
+S3 and GCS data using Arrow.
 
-> In Windows and macOS binary packages, S3 support is included. On Linux when 
-installing from source, S3 support is not enabled by default, and it has 
+> In Windows and macOS binary packages, S3 and GCS support are included. On 
Linux when 
+installing from source, S3 and GCS support is not enabled by default, and it 
has 
 additional system requirements. See `vignette("install", package = "arrow")` 
 for details.
 
 ## URIs
 
 File readers and writers (`read_parquet()`, `write_feather()`, et al.)
-accept an S3 URI as the source or destination file,
-as do `open_dataset()` and `write_dataset()`.
+accept a URI as the source or destination file, as do `open_dataset()` and 
`write_dataset()`.
 An S3 URI looks like:
 
 ```
 s3://[access_key:secret_key@]bucket/path[?region=]
 ```
 
+A GCS URI looks like:
+
+```
+gs://[access_key:secret_key@]bucket/path[?region=]
+gs://anonymous@bucket/path[?region=]
+```
+
 For example, one of the NYC taxi data files used in `vignette("dataset", 
package = "arrow")` is found at
 
 ```
-s3://ursa-labs-taxi-data/2019/06/data.parquet
+s3://voltrondata-labs-datasets/nyc-taxi/year=2019/month=6/data.parquet
+# Or in GCS (anonymous required on public buckets):
+# 
gs://anonymous@voltrondata-labs-datasets/nyc-taxi/year=2019/month=6/data.parquet
 ```
 
 Given this URI, you can pass it to `read_parquet()` just as if it were a local 
file path:
 
 ```r
-df <- read_parquet("s3://ursa-labs-taxi-data/2019/06/data.parquet")
+df <- 
read_parquet("s3://voltrondata-labs-datasets/nyc-taxi/year=2019/month=6/data.parquet")
+# Or in GCS:
+df <- 
read_parquet("gs://anonymous@voltrondata-labs-datasets/nyc-taxi/year=2019/month=6/data.parquet")
 ```
 
 Note that this will be slower to read than if the file were local,
 though if you're running on a machine in the same AWS region as the file in S3,
 the cost of reading the data over the network should be much lower.
 
+### URI options
+
+URIs accept additional options in the query parameters (the part after the `?`)
+that are passed down to configure the underlying file system. They are 
separated 
+by `&`. For example,
+
+```
+s3://voltrondata-labs-datasets/?endpoint_override=https%3A%2F%2Fstorage.googleapis.com&allow_bucket_creation=true
+```
+
+tells the `S3FileSystem` that it should allow the creation of new buckets and 
to 
+talk to Google Storage instead of S3. The latter works because GCS implements 
an 
+S3-compatible API--see [File systems that emulate 
S3](#file-systems-that-emulate-s3) 
+below--but for better support for GCS use the GCSFileSystem with `gs://`.
+
+In GCS, a useful option is `retry_limit_seconds`, which sets the number of 
seconds
+a request may spend retrying before returning an error. The current default is 
+15 minutes, so in many interactive contexts it's nice to set a lower value:
+
+```
+gs://anonymous@voltrondata-labs-datasets/nyc-taxi/?retry_limit_seconds=10
+```
+

Review Comment:
   Yeah I wasn't sure I wanted to do this refactor, but it did feel like it 
would almost be easier to introduce `GcsFileSystem$create()` and *then* explain 
options in URIs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to