[jira] [Commented] (HADOOP-19044) AWS SDK V2 - Update S3A region logic

ASF GitHub Bot (Jira) Wed, 31 Jan 2024 04:39:05 -0800


    [ 
https://issues.apache.org/jira/browse/HADOOP-19044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17812690#comment-17812690
 ]


ASF GitHub Bot commented on HADOOP-19044:
-----------------------------------------

ahmarsuhail commented on code in PR #6479:
URL: https://github.com/apache/hadoop/pull/6479#discussion_r1472764364


##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AEndpointRegion.java:
##########
@@ -257,6 +283,83 @@ public void testWithVPCE() throws Throwable {
     expectInterceptorException(client);
   }
 
+  @Test
+  public void testCentralEndpointWithUSWest2Region() throws Throwable {
+    describe("Access bucket using central endpoint and us-west-2 region");
+    final Configuration conf = getConfiguration();
+    removeBaseAndBucketOverrides(conf, ENDPOINT, AWS_REGION);
+
+    final Configuration newConf = new Configuration(conf);
+
+    newConf.set(ENDPOINT, CENTRAL_ENDPOINT);
+    newConf.set(AWS_REGION, US_WEST_2);
+
+    newFS = new S3AFileSystem();
+    newFS.initialize(getFileSystem().getUri(), newConf);
+
+    assertOpsUsingNewFs();
+  }
+
+  @Test
+  public void testCentralEndpointWithEUWest2Region() throws Throwable {

Review Comment:
   the value in having both `testCentralEndpointWithUSWest2Region` and 
`testCentralEndpointWithEUWest2Region` only comes in when your test bucket is 
in us-west-2, and then with the eu-west-2 test you check if cross region access 
behaves as expected. since we can't guarantee where users test against, I don't 
think we need both tests. Unless we end up using one of the public buckets, as 
for that we know the region. 



##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AEndpointRegion.java:
##########
@@ -146,7 +150,29 @@ public void testCentralEndpoint() throws Throwable {
     describe("Create a client with the central endpoint");
     Configuration conf = getConfiguration();
 
-    S3Client client = createS3Client(conf, CENTRAL_ENDPOINT, null, US_EAST_1, 
false);
+    S3Client client = createS3Client(conf, CENTRAL_ENDPOINT, null, US_EAST_2, 
false);
+
+    expectInterceptorException(client);
+  }
+
+  @Test
+  public void testCentralEndpointWithRegion() throws Throwable {
+    describe("Create a client with the central endpoint but also specify 
region");
+    Configuration conf = getConfiguration();
+
+    S3Client client = createS3Client(conf, CENTRAL_ENDPOINT, US_WEST_2, 
US_WEST_2, false);

Review Comment:
   it's not clear to me what these different test cases are doing. looks like 
they call check if you set endpoint to central and also configure a region, 
it's always the configured region that gets set. do we really need all of them?



##########
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/aws_sdk_v2_changelog.md:
##########
@@ -93,6 +93,20 @@ the bucket. The behaviour has now been changed to:
   response. Otherwise it will throw an error with status code 301 permanently 
moved. This error
   contains the region of the bucket in its header, which we can then use to 
configure the client.
 

Review Comment:
   A lot of what is in this doc is outdated, it was written when we raised the 
initial PR for the upgrade to help with reviewing. For example, we don't do the 
HEAD call to get the region of the bucket. We had to do this when crossRegion 
was not supported in SDK V2. 
   
   Let's create a new section in `index.md` which explains region handling, 
something like:
   
   * `fs.s3a.endpoint` and `fs.s3a.endpoint.region` can be used to set values 
for endpoint and region. 
   * If a value is set in `fs.s3a.endpoint.region` , S3A will configure the S3 
Client to use this value. If this is set to a region that does not match your 
bucket, you will receive a 301 redirect response.
   * If no region is set, but an endpoint is set, S3A will attempt to parse 
your region from the endpoint. 
   * If your endpoint is set to the central `s3.amazonaws.com`, S3A will enable 
cross region access. This means that even if the region 
`fs.s3a.endpoint.region` is incorrect, the SDK will determine the region. This 
is done by making the request, and if the SDK receives a 301 redirect, it 
determines the region at the cost of a HEAD request, and caches it. 
   * If no region is no set, and none could be determined from the endpoint, 
S3A will use US_EAST_2 as the default region and enable cross region access. 
Again, this means that requests will be not fail, but an initial HEAD request 
will be made to get the region of your bucket. 
   * If the configured region is an empty string, we fallback to using the SDK 
region [resolution 
chain](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/region-selection.html).
   
   I don't think we need to explain specifically what the code is doing in too 
much detail, rather just the end behaviour the user will see.





> AWS SDK V2 - Update S3A region logic 
> -------------------------------------
>
>                 Key: HADOOP-19044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19044
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.4.0
>            Reporter: Ahmar Suhail
>            Assignee: Viraj Jasani
>            Priority: Major
>              Labels: pull-request-available
>
> If both fs.s3a.endpoint & fs.s3a.endpoint.region are empty, Spark will set 
> fs.s3a.endpoint to 
> s3.amazonaws.com here:
> [https://github.com/apache/spark/blob/9a2f39318e3af8b3817dc5e4baf52e548d82063c/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L540]
>  
>  
> HADOOP-18908, updated the region logic such that if fs.s3a.endpoint.region is 
> set, or if a region can be parsed from fs.s3a.endpoint (which will happen in 
> this case, region will be US_EAST_1), cross region access is not enabled. 
> This will cause 400 errors if the bucket is not in US_EAST_1. 
>  
> Proposed: Updated the logic so that if the endpoint is the global 
> s3.amazonaws.com , cross region access is enabled.  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-19044) AWS SDK V2 - Update S3A region logic

Reply via email to