[ 
https://issues.apache.org/jira/browse/HADOOP-18073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17471377#comment-17471377
 ] 

Steve Loughran edited comment on HADOOP-18073 at 2/15/22, 6:41 PM:
-------------------------------------------------------------------

that is going to be a pretty traumatic update. Currently we are just moving to 
1.12 in HADOOP-18068.

I believe the API is radically different. One concern there is that it drops 
the transfer manager which we used for copy/rename and uploading from the local 
FS. I see there is now a preview an implementation of that... If it does not 
include any regressions then it should be possible to use. Otherwise someone is 
going to have to implement in the S3a code the parallelize block upload/ copy. 

I'm not going to volunteer for this. If you want to contribute it -it is 
certainly something which ultimately we would like. 

In the meantime, S3A does take session credentials. If you can use the SSO 
mechanism and the AWS CLI to generate a set then you set the relevant 
properties (ideally in a JCEKs file) and use them for the life of the 
credentials. You will be able to use the session delegation tokens to propagate 
those secrets from your machine to the cluster -so deploy hey cluster in EC2 
with lower privileges than the users. You also have the option of providing 
your own AWS credential provider and delegation token implementation. FWIW some 
of the cloudera products do exactly this to let someone to go from kerberos 
auth to session credentials for their assigned roles.


was (Author: [email protected]):

that is going to be a pretty traumatic update. Currently we are just moving to 
1.12 in HADOOP-18068.

I believe the API is radically different. One concern there is that it drops 
the transfer manager which we used for copy stroke rename and uploading from 
the local FS. I see there is now a preview an implementation of that... If it 
does not include any regressions then it should be possible to use. Otherwise 
someone is going to have to implement in the S3a code the parallelize block 
upload/ copy. 

I'm not going to volunteer for this. If you want to contribute it -it is 
certainly something which ultimately we would like. 

In the meantime, S3A does take session credentials. If you can use the SSO 
mechanism and the AWS CLI to generate a set then you set the relevant 
properties (ideally in a JCEKs file) and use them for the life of the 
credentials. You will be able to use the session delegation tokens to propagate 
those secrets from your machine to the cluster -so deploy hey cluster in EC2 
with lower privileges than the users. You also have the option of providing 
your own AWS credential provider and delegation token implementation. FWIW some 
of the cloudera products do exactly this to let someone to go from kerberos 
auth to session credentials for their assigned roles.

> Upgrade AWS SDK to v2
> ---------------------
>
>                 Key: HADOOP-18073
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18073
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: auth, fs/s3
>    Affects Versions: 3.3.1
>            Reporter: xiaowei sun
>            Priority: Major
>
> We would like to access s3 with AWS SSO, which is supported inĀ 
> software.amazon.awssdk:sdk-core:2.*. 
> In particular, from 
> [https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html],
>  when to set 'fs.s3a.aws.credentials.provider', it must be 
> "com.amazonaws.auth.AWSCredentialsProvider". We would like to support 
> "software.amazon.awssdk.auth.credentials.ProfileCredentialsProvider" which 
> supports AWS SSO, so users only need to authenticate once.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to