passaro opened a new pull request, #5163:
URL: https://github.com/apache/hadoop/pull/5163

   ### Description of PR
   
   This is an initial draft PR containing all the changes implemented so far to 
upgrade S3A to the AWS SDK v2. Note that this is still a work in progress and 
we plan to further contribute to it to fill existing gaps and update the SDK 
when missing features are released (e.g. support for Client-side Encryption and 
public release of the new Transfer Manager, currently in preview). 
   
   In the meantime, this PR should provide a view of the whole set of changes 
and start a conversation on the remaining open questions and on how to handle 
breaking changes that affect S3A.
   
   The new document at 
`hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/aws_sdk_v2_changelog.md`
   discusses the key changes contained in this PR and is the suggested starting 
point for the review. 
   
   Further open questions to be discussed:
   
   1. The region logic. Previously, if an endpoint was configured and no 
region, parse the region from the endpoint. If configured endpoint is the 
standard us-east-1 endpoint, set region as null, let SDK figure out the region. 
If no endpoint is configured, set region as us-east-1, and set 
`.withForceGlobalBucketAccessEnabled`. In SDK v2, there’s no cross region 
access, so the correct region of the bucket needs to be set. So we now get the 
region of the bucket using head bucket, and set it. In general, the guidance 
for the new SDK is to only set the region, and let the SDK determine the 
endpoint.
   
   2. Bucket probes. Currently done with doesBucketExist and doesBucketExistV2. 
Why do we need these two separate levels? There is no doesBucketExist operation 
in SDK V2, it will need to be replaced with a HeadBucket/GetBucketACL. Also 
consider that, with the new region logic, we will need to do a HeadBucket while 
configuring the client if the region isn’t specified.
   
   3. Progress Listeners. SDK V2 currently does not support attaching progress 
listeners on requests outside the Transfer Manager. We use them in Put and 
UploadPart in S3ABlockOutputStream. Are they required for the upgrade?
   
   4. ACLs. LogDeliveryWrite, which is a bucket level ACL, is no longer 
supported in the SDK V2. S3A seems to use ACLs at the object level only. Can 
this ACL be removed?
   
   5. Transfer Manager. You can no longer set a threshold for when to use the 
Transfer Manager. The default is 8MB.
   
   
   ### How was this patch tested?
   
   Run `mvn -Dparallel-tests -DtestsThreadCount=8 clean verify` in `eu-west-2`.
   
   The following tests are currently failing:
   
   |Test Suite                        |Test Name.                               
       |Reason |
   |---                               |---                                      
       |---    |
   |TestS3AExceptionTranslation       |test301ContainsEndpoint                  
       |Missing endpoint in SDK exception 
(https://github.com/aws/aws-sdk-java-v2/issues/3048) |
   |TestStreamChangeTracker           |testCopyETagRequired, 
testCopyVersionIdRequired |Transfer Manager response does not yet have 
`CopyObjectResult` |
   |ITestCustomSigner                 |testCustomSignerAndInitializer           
       |Signers not upgraded yet|
   |ITestS3AFileContextStatistics     |testStatistics                           
       |Further investigation needed |
   |ITestS3AEncryptionSSEC            |multiple tests (14 out of 24)            
       |Transfer Manager issue with SSE-C |
   |ITestXAttrCost                    |testXAttrRoot.                           
       |`headObject()` with empty key fails             |
   |ITestSessionDelegationInFileystem |testDelegatedFileSystem                  
       |Succeeds, but `headObject()` with empty key commented out|
   |ITestS3ACannedACLs                |testCreatedObjectsHaveACLs               
       |AWSCannedACL.LogDeliveryWrite not supported in SDK v2|
   
   
   ### For code changes:
   
   - [x] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [x] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to