[
https://issues.apache.org/jira/browse/HADOOP-18073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17471377#comment-17471377
]
Steve Loughran edited comment on HADOOP-18073 at 2/15/22, 6:41 PM:
-------------------------------------------------------------------
that is going to be a pretty traumatic update. Currently we are just moving to
1.12 in HADOOP-18068.
I believe the API is radically different. One concern there is that it drops
the transfer manager which we used for copy/rename and uploading from the local
FS. I see there is now a preview an implementation of that... If it does not
include any regressions then it should be possible to use. Otherwise someone is
going to have to implement in the S3a code the parallelize block upload/ copy.
I'm not going to volunteer for this. If you want to contribute it -it is
certainly something which ultimately we would like.
In the meantime, S3A does take session credentials. If you can use the SSO
mechanism and the AWS CLI to generate a set then you set the relevant
properties (ideally in a JCEKs file) and use them for the life of the
credentials. You will be able to use the session delegation tokens to propagate
those secrets from your machine to the cluster -so deploy hey cluster in EC2
with lower privileges than the users. You also have the option of providing
your own AWS credential provider and delegation token implementation. FWIW some
of the cloudera products do exactly this to let someone to go from kerberos
auth to session credentials for their assigned roles.
was (Author: [email protected]):
that is going to be a pretty traumatic update. Currently we are just moving to
1.12 in HADOOP-18068.
I believe the API is radically different. One concern there is that it drops
the transfer manager which we used for copy stroke rename and uploading from
the local FS. I see there is now a preview an implementation of that... If it
does not include any regressions then it should be possible to use. Otherwise
someone is going to have to implement in the S3a code the parallelize block
upload/ copy.
I'm not going to volunteer for this. If you want to contribute it -it is
certainly something which ultimately we would like.
In the meantime, S3A does take session credentials. If you can use the SSO
mechanism and the AWS CLI to generate a set then you set the relevant
properties (ideally in a JCEKs file) and use them for the life of the
credentials. You will be able to use the session delegation tokens to propagate
those secrets from your machine to the cluster -so deploy hey cluster in EC2
with lower privileges than the users. You also have the option of providing
your own AWS credential provider and delegation token implementation. FWIW some
of the cloudera products do exactly this to let someone to go from kerberos
auth to session credentials for their assigned roles.
> Upgrade AWS SDK to v2
> ---------------------
>
> Key: HADOOP-18073
> URL: https://issues.apache.org/jira/browse/HADOOP-18073
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: auth, fs/s3
> Affects Versions: 3.3.1
> Reporter: xiaowei sun
> Priority: Major
>
> We would like to access s3 with AWS SSO, which is supported inĀ
> software.amazon.awssdk:sdk-core:2.*.
> In particular, from
> [https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html],
> when to set 'fs.s3a.aws.credentials.provider', it must be
> "com.amazonaws.auth.AWSCredentialsProvider". We would like to support
> "software.amazon.awssdk.auth.credentials.ProfileCredentialsProvider" which
> supports AWS SSO, so users only need to authenticate once.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]