This is an automated email from the ASF dual-hosted git repository.
kou pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/main by this push:
new 334b46d349 GH-35409: [Python][Docs] Clarify S3FileSystem Credentials
chain for EC2 (#35312)
334b46d349 is described below
commit 334b46d349af897ba03d7d72be86d23aa5ee8b43
Author: Kevin Liu <[email protected]>
AuthorDate: Mon Jul 31 22:14:27 2023 -0700
GH-35409: [Python][Docs] Clarify S3FileSystem Credentials chain for EC2
(#35312)
### Rationale for this change
When resolving AWS credentials on EC2 hosts, the underlying AWS SDK also
looks at the EC2 Instance Metadata Service.
I want to document this behavior for `pyarrow`. The [`s3fs`
documentation](https://s3fs.readthedocs.io/en/latest/#credentials) mention this
specific case for EC2.
### What changes are included in this PR?
Documentation for the behavior described above.
#### Technical Details
`S3FileSystem` uses the
[`CS3Options.Defaults()`](https://github.com/apache/arrow/blob/5de56928e0fe43f02005552eee058de57ffb2682/python/pyarrow/_s3fs.pyx#L317)
option when no credentials are passed into the constructor. It utilizes the
[`Aws::Auth::DefaultAWSCredentialsProviderChain`](https://github.com/apache/arrow/blob/1de159d0f6763766c19b183dd309b8757723b43a/cpp/src/arrow/filesystem/s3fs.cc#L213)
The C++ implementation of
[`DefaultAWSCredentialsProviderChain`](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_default_a_w_s_credentials_provider_chain.html)
not only [reads the environment
variable](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_environment_a_w_s_credentials_provider.html)
when trying to resolve AWS credentials, but also [looks at profile
config](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_profile_config_file_a_w
[...]
### Are these changes tested?
No, just documentation changes
### Are there any user-facing changes?
Yes, changing public documentation
* Closes: #35409
### Render Changes
Render the changes locally via [Building the
doc](https://arrow.apache.org/docs/developers/documentation.html#building-docs):
`docs/source/python/filesystems.rst`:

`python/pyarrow/_s3fs.pyx`:

Lead-authored-by: Kevin Liu <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
---
docs/source/python/filesystems.rst | 5 +++--
python/pyarrow/_s3fs.pyx | 12 +++++++++---
2 files changed, 12 insertions(+), 5 deletions(-)
diff --git a/docs/source/python/filesystems.rst
b/docs/source/python/filesystems.rst
index 40656f6b76..3fc10dc771 100644
--- a/docs/source/python/filesystems.rst
+++ b/docs/source/python/filesystems.rst
@@ -153,8 +153,9 @@ PyArrow implements natively a S3 filesystem for S3
compatible storage.
The :class:`S3FileSystem` constructor has several options to configure the S3
connection (e.g. credentials, the region, an endpoint override, etc). In
addition, the constructor will also inspect configured S3 credentials as
-supported by AWS (for example the ``AWS_ACCESS_KEY_ID`` and
-``AWS_SECRET_ACCESS_KEY`` environment variables).
+supported by AWS (such as the ``AWS_ACCESS_KEY_ID`` and
+``AWS_SECRET_ACCESS_KEY`` environment variables, AWS configuration files,
+and EC2 Instance Metadata Service for EC2 nodes).
Example how you can read contents from a S3 bucket::
diff --git a/python/pyarrow/_s3fs.pyx b/python/pyarrow/_s3fs.pyx
index e76c7b9ffa..51c248d147 100644
--- a/python/pyarrow/_s3fs.pyx
+++ b/python/pyarrow/_s3fs.pyx
@@ -140,14 +140,20 @@ cdef class S3FileSystem(FileSystem):
"""
S3-backed FileSystem implementation
- If neither access_key nor secret_key are provided, and role_arn is also not
- provided, then attempts to initialize from AWS environment variables,
- otherwise both access_key and secret_key must be provided.
+ AWS access_key and secret_key can be provided explicitly.
If role_arn is provided instead of access_key and secret_key, temporary
credentials will be fetched by issuing a request to STS to assume the
specified role.
+ If neither access_key nor secret_key are provided, and role_arn is also not
+ provided, then attempts to establish the credentials automatically.
+ S3FileSystem will try the following methods, in order:
+
+ * ``AWS_ACCESS_KEY_ID``, ``AWS_SECRET_ACCESS_KEY``, and
``AWS_SESSION_TOKEN`` environment variables
+ * configuration files such as ``~/.aws/credentials`` and ``~/.aws/config``
+ * for nodes on Amazon EC2, the EC2 Instance Metadata Service
+
Note: S3 buckets are special and the operations available on them may be
limited or more expensive than desired.