This is an automated email from the ASF dual-hosted git repository.
jonkeane pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/master by this push:
new c0fe679 ARROW-13400 [R] Update fs.Rmd (Working with S3) vignette
c0fe679 is described below
commit c0fe679aa4369f3f0ea85209d385fdd89d79e3b3
Author: Dewey Dunnington <[email protected]>
AuthorDate: Fri Nov 19 14:14:11 2021 -0600
ARROW-13400 [R] Update fs.Rmd (Working with S3) vignette
Just a few updates and fixes to rough edges according to the notes in
ARROW-13400! In particular,
- Added a section on using `proxy_options`
- Added that you can use `$ls()` to view a directory listing (I found this
useful when testing the S3 proxy server stuff)
Closes #11729 from paleolimbot/r-s3-vignette
Authored-by: Dewey Dunnington <[email protected]>
Signed-off-by: Jonathan Keane <[email protected]>
---
r/vignettes/fs.Rmd | 40 ++++++++++++++++++++++++++++------------
1 file changed, 28 insertions(+), 12 deletions(-)
diff --git a/r/vignettes/fs.Rmd b/r/vignettes/fs.Rmd
index 5d699c4..6990469 100644
--- a/r/vignettes/fs.Rmd
+++ b/r/vignettes/fs.Rmd
@@ -32,7 +32,7 @@ For example, one of the NYC taxi data files used in
`vignette("dataset", package
s3://ursa-labs-taxi-data/2019/06/data.parquet
```
-Given this URI, we can pass it to `read_parquet()` just as if it were a local
file path:
+Given this URI, you can pass it to `read_parquet()` just as if it were a local
file path:
```r
df <- read_parquet("s3://ursa-labs-taxi-data/2019/06/data.parquet")
@@ -54,7 +54,7 @@ This may be convenient when dealing with
long URIs, and it's necessary for some options and authentication methods
that aren't supported in the URI format.
-With a `FileSystem` object, we can point to specific files in it with the
`$path()` method.
+With a `FileSystem` object, you can point to specific files in it with the
`$path()` method.
In the previous example, this would look like:
```r
@@ -62,13 +62,20 @@ bucket <- s3_bucket("ursa-labs-taxi-data")
df <- read_parquet(bucket$path("2019/06/data.parquet"))
```
-See the help for `FileSystem` for a list of options that `s3_bucket()` and
`S3FileSystem$create()`
+You can list the files and/or directories in an S3 bucket or subdirectory using
+the `$ls()` method:
+
+```r
+bucket$ls()
+```
+
+See `help(FileSystem)` for a list of options that `s3_bucket()` and
`S3FileSystem$create()`
can take. `region`, `scheme`, and `endpoint_override` can be encoded as query
parameters in the URI (though `region` will be auto-detected in `s3_bucket()`
or from the URI if omitted).
`access_key` and `secret_key` can also be included,
but other options are not supported in the URI.
-The object that `s3_bucket()` returns is technically a `SubTreeFileSystem`,
which holds a path and a file system to which it corresponds.
`SubTreeFileSystem`s can be useful for holding a reference to a subdirectory
somewhere, on S3 or elsewhere.
+The object that `s3_bucket()` returns is technically a `SubTreeFileSystem`,
which holds a path and a file system to which it corresponds.
`SubTreeFileSystem`s can be useful for holding a reference to a subdirectory
somewhere (on S3 or elsewhere).
One way to get a subtree is to call the `$cd()` method on a `FileSystem`
@@ -86,21 +93,30 @@ june2019 <-
SubTreeFileSystem$create("s3://ursa-labs-taxi-data/2019/06")
## Authentication
To access private S3 buckets, you need typically need two secret parameters:
-a `access_key`, which is like a user id,
-and `secret_key`, like a token.
-There are a few options for passing these credentials:
+a `access_key`, which is like a user id, and `secret_key`, which is like a
token
+or password. There are a few options for passing these credentials:
-1. Include them in the URI, like
`s3://access_key:secret_key@bucket-name/path/to/file`. Be sure to
[URL-encode](https://en.wikipedia.org/wiki/Percent-encoding) your secrets if
they contain special characters like "/".
+- Include them in the URI, like
`s3://access_key:secret_key@bucket-name/path/to/file`. Be sure to
[URL-encode](https://en.wikipedia.org/wiki/Percent-encoding) your secrets if
they contain special characters like "/" (e.g., `URLencode("123/456", reserved
= TRUE)`).
-2. Pass them as `access_key` and `secret_key` to `S3FileSystem$create()` or
`s3_bucket()`
+- Pass them as `access_key` and `secret_key` to `S3FileSystem$create()` or
`s3_bucket()`
-3. Set them as environment variables named `AWS_ACCESS_KEY_ID` and
`AWS_SECRET_ACCESS_KEY`, respectively.
+- Set them as environment variables named `AWS_ACCESS_KEY_ID` and
`AWS_SECRET_ACCESS_KEY`, respectively.
-4. Define them in a `~/.aws/credentials` file, according to the [AWS
documentation](https://docs.aws.amazon.com/sdk-for-cpp/v1/developer-guide/credentials.html).
+- Define them in a `~/.aws/credentials` file, according to the [AWS
documentation](https://docs.aws.amazon.com/sdk-for-cpp/v1/developer-guide/credentials.html).
-You can also use an
[AccessRole](https://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRole.html)
+- Use an
[AccessRole](https://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRole.html)
for temporary access by passing the `role_arn` identifier to
`S3FileSystem$create()` or `s3_bucket()`.
+## Using a proxy server
+
+If you need to use a proxy server to connect to an S3 bucket, you can provide
+a URI in the form `http://user:password@host:port` to `proxy_options`. For
+example, a local proxy server running on port 1316 can be used like this:
+
+```r
+bucket <- s3_bucket("ursa-labs-taxi-data", proxy_options =
"http://localhost:1316")
+```
+
## File systems that emulate S3
The `S3FileSystem` machinery enables you to work with any file system that