This is an automated email from the ASF dual-hosted git repository.

kou pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/main by this push:
     new 334b46d349 GH-35409: [Python][Docs] Clarify S3FileSystem Credentials 
chain for EC2 (#35312)
334b46d349 is described below

commit 334b46d349af897ba03d7d72be86d23aa5ee8b43
Author: Kevin Liu <[email protected]>
AuthorDate: Mon Jul 31 22:14:27 2023 -0700

    GH-35409: [Python][Docs] Clarify S3FileSystem Credentials chain for EC2 
(#35312)
    
    
    
    ### Rationale for this change
    
    When resolving AWS credentials on EC2 hosts, the underlying AWS SDK also 
looks at the EC2 Instance Metadata Service.
    
    I want to document this behavior for `pyarrow`.  The [`s3fs` 
documentation](https://s3fs.readthedocs.io/en/latest/#credentials) mention this 
specific case for EC2.
    
    ### What changes are included in this PR?
    
    Documentation for the behavior described above.
    
    #### Technical Details
    `S3FileSystem` uses the 
[`CS3Options.Defaults()`](https://github.com/apache/arrow/blob/5de56928e0fe43f02005552eee058de57ffb2682/python/pyarrow/_s3fs.pyx#L317)
 option when no credentials are passed into the constructor.  It utilizes the 
[`Aws::Auth::DefaultAWSCredentialsProviderChain`](https://github.com/apache/arrow/blob/1de159d0f6763766c19b183dd309b8757723b43a/cpp/src/arrow/filesystem/s3fs.cc#L213)
    
    The C++ implementation of 
[`DefaultAWSCredentialsProviderChain`](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_default_a_w_s_credentials_provider_chain.html)
 not only [reads the environment 
variable](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_environment_a_w_s_credentials_provider.html)
 when trying to resolve AWS credentials, but also [looks at profile 
config](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_profile_config_file_a_w
 [...]
    
    ### Are these changes tested?
    
    No, just documentation changes
    
    ### Are there any user-facing changes?
    
    Yes, changing public documentation
    
    * Closes: #35409
    
    ### Render Changes
    Render the changes locally via [Building the 
doc](https://arrow.apache.org/docs/developers/documentation.html#building-docs):
    `docs/source/python/filesystems.rst`:
    ![Screenshot 2023-07-30 at 6 22 02 
PM](https://github.com/apache/arrow/assets/9057843/6af053a3-e7a7-4a68-a5b5-02c50e9290c6)
    
    `python/pyarrow/_s3fs.pyx`:
    ![Screenshot 2023-07-31 at 3 31 30 
PM](https://github.com/apache/arrow/assets/9057843/d79768be-67ce-46c0-88ed-a833e540f77d)
    
    Lead-authored-by: Kevin Liu <[email protected]>
    Co-authored-by: Sutou Kouhei <[email protected]>
    Signed-off-by: Sutou Kouhei <[email protected]>
---
 docs/source/python/filesystems.rst |  5 +++--
 python/pyarrow/_s3fs.pyx           | 12 +++++++++---
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/docs/source/python/filesystems.rst 
b/docs/source/python/filesystems.rst
index 40656f6b76..3fc10dc771 100644
--- a/docs/source/python/filesystems.rst
+++ b/docs/source/python/filesystems.rst
@@ -153,8 +153,9 @@ PyArrow implements natively a S3 filesystem for S3 
compatible storage.
 The :class:`S3FileSystem` constructor has several options to configure the S3
 connection (e.g. credentials, the region, an endpoint override, etc). In
 addition, the constructor will also inspect configured S3 credentials as
-supported by AWS (for example the ``AWS_ACCESS_KEY_ID`` and
-``AWS_SECRET_ACCESS_KEY`` environment variables).
+supported by AWS (such as the ``AWS_ACCESS_KEY_ID`` and
+``AWS_SECRET_ACCESS_KEY`` environment variables, AWS configuration files,
+and EC2 Instance Metadata Service for EC2 nodes).
 
 
 Example how you can read contents from a S3 bucket::
diff --git a/python/pyarrow/_s3fs.pyx b/python/pyarrow/_s3fs.pyx
index e76c7b9ffa..51c248d147 100644
--- a/python/pyarrow/_s3fs.pyx
+++ b/python/pyarrow/_s3fs.pyx
@@ -140,14 +140,20 @@ cdef class S3FileSystem(FileSystem):
     """
     S3-backed FileSystem implementation
 
-    If neither access_key nor secret_key are provided, and role_arn is also not
-    provided, then attempts to initialize from AWS environment variables,
-    otherwise both access_key and secret_key must be provided.
+    AWS access_key and secret_key can be provided explicitly.
 
     If role_arn is provided instead of access_key and secret_key, temporary
     credentials will be fetched by issuing a request to STS to assume the
     specified role.
 
+    If neither access_key nor secret_key are provided, and role_arn is also not
+    provided, then attempts to establish the credentials automatically.
+    S3FileSystem will try the following methods, in order:
+
+    * ``AWS_ACCESS_KEY_ID``, ``AWS_SECRET_ACCESS_KEY``, and 
``AWS_SESSION_TOKEN`` environment variables
+    * configuration files such as ``~/.aws/credentials`` and ``~/.aws/config``
+    * for nodes on Amazon EC2, the EC2 Instance Metadata Service
+
     Note: S3 buckets are special and the operations available on them may be
     limited or more expensive than desired.
 

Reply via email to