fjetter opened a new issue, #41794:
URL: https://github.com/apache/arrow/issues/41794
### Describe the enhancement requested
Given a typical AWS credentials setup that defines IAM roles like the
following
```
# ~/.aws/config
[default]
region=us-east-2
role_arn=arn:aws:iam::123456789012:role/RoleName
source_profile=default
# ~/.aws/credentials
[default]
aws_access_key_id=XXXXXXXXXXX
aws_secret_access_key=YYYYYYYYYYYY
```
almost all AWS sdks are interpreting this correctly as an `assume-role`
method that generates a temporary STS token pair.
For example, using python this looks like
```python
import boto3
b3sess = boto3.Session()
creds = b3sess.get_credentials()
{
"method": creds.method,
"secret": creds.secret_key[:5] + "...",
"token": creds.token[:5] + "...",
}
{'method': 'assume-role', 'secret': 'jALbI...', 'token': 'IQoJb...'}
```
The C++ sdk is deviating from how the default credentials chain is
implemented and is not supporting this kind of configuration but instead uses
the plain access key + secret key pair that is found in the configuration which
does not necessarily provide sufficient permissions.
Dask adopted the S3FileSystem as a more performant alternative to the
existing default fsspec filesystem for its parquet reader but this lack of
support in the C++ sdk is a bit of a nasty blocker for further adoption. We
ended up writing a workaround for our benchmarking by using boto to read the
credentials and initialize the [S3FileSystem
manually](https://github.com/coiled/benchmarks/blob/934a69e0ed093ef7319a5034b87c03a53dc0c0d8/tests/tpch/conftest.py#L290-L301)
but this has a couple of flaws. For starters, this is pretty unergonomic and
nontrivial but more importantly this prohibits the refresh of the token after
expiration (max duration is 1hr)
There's been some discussion on the aws-sdk-cpp repo about this with a
suggestion to implement an amended credentials chain, see
[here](https://github.com/aws/aws-sdk-cpp/issues/150#issuecomment-538548438)
that includes the `STSProfileCredentialsProvider` but it's also pointed out
that this is flawed as well.
Also related
- https://github.com/aws/aws-sdk-cpp/issues/2814
- https://github.com/aws/aws-sdk-cpp/pull/2815
I know this is ultimately a aws-sdk-cpp problem but end users of the arrow
`S3FileSystem` do not have this transparency and expect things to "just work",
particularly when consuming the python API and they are used from how boto and
other libraries are parsing credentials.
cc @pitrou since you've been poking in this area recently
### Component(s)
C++, Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]