fjetter opened a new issue, #41794:
URL: https://github.com/apache/arrow/issues/41794

   ### Describe the enhancement requested
   
   Given a typical AWS credentials setup that defines IAM roles like the 
following
   
   ```
   # ~/.aws/config
   [default]
   region=us-east-2
   role_arn=arn:aws:iam::123456789012:role/RoleName
   source_profile=default
   
   # ~/.aws/credentials
   [default]
   aws_access_key_id=XXXXXXXXXXX
   aws_secret_access_key=YYYYYYYYYYYY
   ```
   
   almost all AWS sdks are interpreting this correctly as an `assume-role` 
method that generates a temporary STS token pair.
   
   For example, using python this looks like
   
   ```python
   import boto3
   b3sess = boto3.Session()
   creds = b3sess.get_credentials()
   {
       "method": creds.method,
       "secret": creds.secret_key[:5] + "...",
       "token": creds.token[:5] + "...",
   }
   
   {'method': 'assume-role', 'secret': 'jALbI...', 'token': 'IQoJb...'}
   ```
   
   The C++ sdk is deviating from how the default credentials chain is 
implemented and is not supporting this kind of configuration but instead uses 
the plain access key + secret key pair that is found in the configuration which 
does not necessarily provide sufficient permissions.
   
   Dask adopted the S3FileSystem as a more performant alternative to the 
existing default fsspec filesystem for its parquet reader but this lack of 
support in the C++ sdk is a bit of a nasty blocker for further adoption. We 
ended up writing a workaround for our benchmarking by using boto to read the 
credentials and initialize the [S3FileSystem 
manually](https://github.com/coiled/benchmarks/blob/934a69e0ed093ef7319a5034b87c03a53dc0c0d8/tests/tpch/conftest.py#L290-L301)
 but this has a couple of flaws. For starters, this is pretty unergonomic and 
nontrivial but more importantly this prohibits the refresh of the token after 
expiration (max duration is 1hr)
   
   There's been some discussion on the aws-sdk-cpp repo about this with a 
suggestion to implement an amended credentials chain, see 
[here](https://github.com/aws/aws-sdk-cpp/issues/150#issuecomment-538548438) 
that includes the `STSProfileCredentialsProvider` but it's also pointed out 
that this is flawed as well.
   
   Also related
   - https://github.com/aws/aws-sdk-cpp/issues/2814
   - https://github.com/aws/aws-sdk-cpp/pull/2815
   
   I know this is ultimately a aws-sdk-cpp problem but end users of the arrow 
`S3FileSystem` do not have this transparency and expect things to "just work", 
particularly when consuming the python API and they are used from how boto and 
other libraries are parsing credentials.
   
   cc @pitrou since you've been poking in this area recently
   
   ### Component(s)
   
   C++, Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to