ahmarsuhail commented on code in PR #6479:
URL: https://github.com/apache/hadoop/pull/6479#discussion_r1472764364


##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AEndpointRegion.java:
##########
@@ -257,6 +283,83 @@ public void testWithVPCE() throws Throwable {
     expectInterceptorException(client);
   }
 
+  @Test
+  public void testCentralEndpointWithUSWest2Region() throws Throwable {
+    describe("Access bucket using central endpoint and us-west-2 region");
+    final Configuration conf = getConfiguration();
+    removeBaseAndBucketOverrides(conf, ENDPOINT, AWS_REGION);
+
+    final Configuration newConf = new Configuration(conf);
+
+    newConf.set(ENDPOINT, CENTRAL_ENDPOINT);
+    newConf.set(AWS_REGION, US_WEST_2);
+
+    newFS = new S3AFileSystem();
+    newFS.initialize(getFileSystem().getUri(), newConf);
+
+    assertOpsUsingNewFs();
+  }
+
+  @Test
+  public void testCentralEndpointWithEUWest2Region() throws Throwable {

Review Comment:
   the value in having both `testCentralEndpointWithUSWest2Region` and 
`testCentralEndpointWithEUWest2Region` only comes in when your test bucket is 
in us-west-2, and then with the eu-west-2 test you check if cross region access 
behaves as expected. since we can't guarantee where users test against, I don't 
think we need both tests. Unless we end up using one of the public buckets, as 
for that we know the region. 



##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AEndpointRegion.java:
##########
@@ -146,7 +150,29 @@ public void testCentralEndpoint() throws Throwable {
     describe("Create a client with the central endpoint");
     Configuration conf = getConfiguration();
 
-    S3Client client = createS3Client(conf, CENTRAL_ENDPOINT, null, US_EAST_1, 
false);
+    S3Client client = createS3Client(conf, CENTRAL_ENDPOINT, null, US_EAST_2, 
false);
+
+    expectInterceptorException(client);
+  }
+
+  @Test
+  public void testCentralEndpointWithRegion() throws Throwable {
+    describe("Create a client with the central endpoint but also specify 
region");
+    Configuration conf = getConfiguration();
+
+    S3Client client = createS3Client(conf, CENTRAL_ENDPOINT, US_WEST_2, 
US_WEST_2, false);

Review Comment:
   it's not clear to me what these different test cases are doing. looks like 
they call check if you set endpoint to central and also configure a region, 
it's always the configured region that gets set. do we really need all of them?



##########
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/aws_sdk_v2_changelog.md:
##########
@@ -93,6 +93,20 @@ the bucket. The behaviour has now been changed to:
   response. Otherwise it will throw an error with status code 301 permanently 
moved. This error
   contains the region of the bucket in its header, which we can then use to 
configure the client.
 

Review Comment:
   A lot of what is in this doc is outdated, it was written when we raised the 
initial PR for the upgrade to help with reviewing. For example, we don't do the 
HEAD call to get the region of the bucket. We had to do this when crossRegion 
was not supported in SDK V2. 
   
   Let's create a new section in `index.md` which explains region handling, 
something like:
   
   * `fs.s3a.endpoint` and `fs.s3a.endpoint.region` can be used to set values 
for endpoint and region. 
   * If a value is set in `fs.s3a.endpoint.region` , S3A will configure the S3 
Client to use this value. If this is set to a region that does not match your 
bucket, you will receive a 301 redirect response.
   * If no region is set, but an endpoint is set, S3A will attempt to parse 
your region from the endpoint. 
   * If your endpoint is set to the central `s3.amazonaws.com`, S3A will enable 
cross region access. This means that even if the region 
`fs.s3a.endpoint.region` is incorrect, the SDK will determine the region. This 
is done by making the request, and if the SDK receives a 301 redirect, it 
determines the region at the cost of a HEAD request, and caches it. 
   * If no region is no set, and none could be determined from the endpoint, 
S3A will use US_EAST_2 as the default region and enable cross region access. 
Again, this means that requests will be not fail, but an initial HEAD request 
will be made to get the region of your bucket. 
   * If the configured region is an empty string, we fallback to using the SDK 
region [resolution 
chain](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/region-selection.html).
   
   I don't think we need to explain specifically what the code is doing in too 
much detail, rather just the end behaviour the user will see.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to