passaro opened a new pull request, #5163:
URL: https://github.com/apache/hadoop/pull/5163
### Description of PR
This is an initial draft PR containing all the changes implemented so far to
upgrade S3A to the AWS SDK v2. Note that this is still a work in progress and
we plan to further contribute to it to fill existing gaps and update the SDK
when missing features are released (e.g. support for Client-side Encryption and
public release of the new Transfer Manager, currently in preview).
In the meantime, this PR should provide a view of the whole set of changes
and start a conversation on the remaining open questions and on how to handle
breaking changes that affect S3A.
The new document at
`hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/aws_sdk_v2_changelog.md`
discusses the key changes contained in this PR and is the suggested starting
point for the review.
Further open questions to be discussed:
1. The region logic. Previously, if an endpoint was configured and no
region, parse the region from the endpoint. If configured endpoint is the
standard us-east-1 endpoint, set region as null, let SDK figure out the region.
If no endpoint is configured, set region as us-east-1, and set
`.withForceGlobalBucketAccessEnabled`. In SDK v2, there’s no cross region
access, so the correct region of the bucket needs to be set. So we now get the
region of the bucket using head bucket, and set it. In general, the guidance
for the new SDK is to only set the region, and let the SDK determine the
endpoint.
2. Bucket probes. Currently done with doesBucketExist and doesBucketExistV2.
Why do we need these two separate levels? There is no doesBucketExist operation
in SDK V2, it will need to be replaced with a HeadBucket/GetBucketACL. Also
consider that, with the new region logic, we will need to do a HeadBucket while
configuring the client if the region isn’t specified.
3. Progress Listeners. SDK V2 currently does not support attaching progress
listeners on requests outside the Transfer Manager. We use them in Put and
UploadPart in S3ABlockOutputStream. Are they required for the upgrade?
4. ACLs. LogDeliveryWrite, which is a bucket level ACL, is no longer
supported in the SDK V2. S3A seems to use ACLs at the object level only. Can
this ACL be removed?
5. Transfer Manager. You can no longer set a threshold for when to use the
Transfer Manager. The default is 8MB.
### How was this patch tested?
Run `mvn -Dparallel-tests -DtestsThreadCount=8 clean verify` in `eu-west-2`.
The following tests are currently failing:
|Test Suite |Test Name.
|Reason |
|--- |---
|--- |
|TestS3AExceptionTranslation |test301ContainsEndpoint
|Missing endpoint in SDK exception
(https://github.com/aws/aws-sdk-java-v2/issues/3048) |
|TestStreamChangeTracker |testCopyETagRequired,
testCopyVersionIdRequired |Transfer Manager response does not yet have
`CopyObjectResult` |
|ITestCustomSigner |testCustomSignerAndInitializer
|Signers not upgraded yet|
|ITestS3AFileContextStatistics |testStatistics
|Further investigation needed |
|ITestS3AEncryptionSSEC |multiple tests (14 out of 24)
|Transfer Manager issue with SSE-C |
|ITestXAttrCost |testXAttrRoot.
|`headObject()` with empty key fails |
|ITestSessionDelegationInFileystem |testDelegatedFileSystem
|Succeeds, but `headObject()` with empty key commented out|
|ITestS3ACannedACLs |testCreatedObjectsHaveACLs
|AWSCannedACL.LogDeliveryWrite not supported in SDK v2|
### For code changes:
- [x] Does the title or this PR starts with the corresponding JIRA issue id
(e.g. 'HADOOP-17799. Your PR title ...')?
- [x] Object storage: have the integration tests been executed and the
endpoint declared according to the connector-specific documentation?
- [ ] If adding new dependencies to the code, are these dependencies
licensed in a way that is compatible for inclusion under [ASF
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`,
`NOTICE-binary` files?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]