[
https://issues.apache.org/jira/browse/HADOOP-18073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638319#comment-17638319
]
ASF GitHub Bot commented on HADOOP-18073:
-----------------------------------------
passaro opened a new pull request, #5163:
URL: https://github.com/apache/hadoop/pull/5163
### Description of PR
This is an initial draft PR containing all the changes implemented so far to
upgrade S3A to the AWS SDK v2. Note that this is still a work in progress and
we plan to further contribute to it to fill existing gaps and update the SDK
when missing features are released (e.g. support for Client-side Encryption and
public release of the new Transfer Manager, currently in preview).
In the meantime, this PR should provide a view of the whole set of changes
and start a conversation on the remaining open questions and on how to handle
breaking changes that affect S3A.
The new document at
`hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/aws_sdk_v2_changelog.md`
discusses the key changes contained in this PR and is the suggested starting
point for the review.
Further open questions to be discussed:
1. The region logic. Previously, if an endpoint was configured and no
region, parse the region from the endpoint. If configured endpoint is the
standard us-east-1 endpoint, set region as null, let SDK figure out the region.
If no endpoint is configured, set region as us-east-1, and set
`.withForceGlobalBucketAccessEnabled`. In SDK v2, there’s no cross region
access, so the correct region of the bucket needs to be set. So we now get the
region of the bucket using head bucket, and set it. In general, the guidance
for the new SDK is to only set the region, and let the SDK determine the
endpoint.
2. Bucket probes. Currently done with doesBucketExist and doesBucketExistV2.
Why do we need these two separate levels? There is no doesBucketExist operation
in SDK V2, it will need to be replaced with a HeadBucket/GetBucketACL. Also
consider that, with the new region logic, we will need to do a HeadBucket while
configuring the client if the region isn’t specified.
3. Progress Listeners. SDK V2 currently does not support attaching progress
listeners on requests outside the Transfer Manager. We use them in Put and
UploadPart in S3ABlockOutputStream. Are they required for the upgrade?
4. ACLs. LogDeliveryWrite, which is a bucket level ACL, is no longer
supported in the SDK V2. S3A seems to use ACLs at the object level only. Can
this ACL be removed?
5. Transfer Manager. You can no longer set a threshold for when to use the
Transfer Manager. The default is 8MB.
### How was this patch tested?
Run `mvn -Dparallel-tests -DtestsThreadCount=8 clean verify` in `eu-west-2`.
The following tests are currently failing:
|Test Suite |Test Name.
|Reason |
|-
> Upgrade AWS SDK to v2
> ---------------------
>
> Key: HADOOP-18073
> URL: https://issues.apache.org/jira/browse/HADOOP-18073
> Project: Hadoop Common
> Issue Type: Task
> Components: auth, fs/s3
> Affects Versions: 3.3.1
> Reporter: xiaowei sun
> Assignee: Ahmar Suhail
> Priority: Major
> Labels: pull-request-available
> Attachments: Upgrading S3A to SDKV2.pdf
>
>
> This task tracks upgrading Hadoop's AWS connector S3A from AWS SDK for Java
> V1 to AWS SDK for Java V2.
> Original use case:
> {quote}We would like to access s3 with AWS SSO, which is supported in
> software.amazon.awssdk:sdk-core:2.*.
> In particular, from
> [https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html],
> when to set 'fs.s3a.aws.credentials.provider', it must be
> "com.amazonaws.auth.AWSCredentialsProvider". We would like to support
> "software.amazon.awssdk.auth.credentials.ProfileCredentialsProvider" which
> supports AWS SSO, so users only need to authenticate once.
> {quote}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]