[ 
https://issues.apache.org/jira/browse/HADOOP-19044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17812281#comment-17812281
 ] 

ASF GitHub Bot commented on HADOOP-19044:
-----------------------------------------

ahmarsuhail commented on code in PR #6479:
URL: https://github.com/apache/hadoop/pull/6479#discussion_r1471123903


##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/DefaultS3ClientFactory.java:
##########
@@ -289,17 +290,36 @@ private <BuilderT extends S3BaseClientBuilder<BuilderT, 
ClientT>, ClientT> void
     builder.fipsEnabled(fipsEnabled);
 
     if (endpoint != null) {
+      boolean overrideEndpoint = true;
       checkArgument(!fipsEnabled,
           "%s : %s", ERROR_ENDPOINT_WITH_FIPS, endpoint);
-      builder.endpointOverride(endpoint);
-      // No region was configured, try to determine it from the endpoint.
-      if (region == null) {
-        region = getS3RegionFromEndpoint(parameters.getEndpoint());
+      boolean endpointEndsWithCentral =
+          endpointStr.endsWith(CENTRAL_ENDPOINT);
+      // No region was configured or the endpoint is central,
+      // determine the region from the endpoint.
+      if (region == null || endpointEndsWithCentral) {

Review Comment:
   hmm, not sure about this. now we're parsing region if region is null or 
endpoint = s3.amazonaws.com. 
   
   So if you set `s3.amazonaws.com` and region to eu-west-2, you still end up 
with us setting the region to `us-east-2` and cross region enabled. My thinking 
here is that a lot of people may have endpoint set to s3.amazonaws.com (as 
atleast with SDK V1 it was harmless to do that I think) .
   
   we only want to get into this parsing if region == null. so let's revert to 
the previous condition here. And then we never don't want to override if the 
endpoint is s3.amazonaws.com. Suggested:
   
   ```
       if (endpoint != null) {
         checkArgument(!fipsEnabled,
             "%s : %s", ERROR_ENDPOINT_WITH_FIPS, endpoint);
         boolean endpointEndsWithCentral =
             endpointStr.endsWith(CENTRAL_ENDPOINT);
         // No region was configured or the endpoint is central,
         // determine the region from the endpoint.
         if (region == null) {
           region = getS3RegionFromEndpoint(endpointStr,
               endpointEndsWithCentral);
           if (region != null) {
             origin = "endpoint";
             if (endpointEndsWithCentral) {
               builder.crossRegionAccessEnabled(true);
               origin = "origin with cross region access";
               LOG.debug("Enabling cross region access for endpoint {}",
                   endpointStr);
             }
           }
         }
         
         // No need to override endpoint with "s3.amazonaws.com".
         // Let the client take care of endpoint resolution. Overriding
         // the endpoint with "s3.amazonaws.com" causes 400 Bad Request
         // errors for non-existent buckets and objects.
         // ref: https://github.com/aws/aws-sdk-java-v2/issues/4846
         if (!endpointEndsWithCentral) {
           builder.endpointOverride(endpoint);
           LOG.debug("Setting endpoint to {}", endpoint);
         }
       }
   ``` 
   
   So now:
   1) if endpoint = s3.amazonaws.com and region is null, set to US_EAST_2 and 
enable cross region, and don't override endpoint.
   2) if endpoint = s3.amazonaw.com and region is set (eg to eu-west-1), set 
region but do not override endpoint..let SDK figure it out
   



##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AEndpointRegion.java:
##########
@@ -146,7 +150,21 @@ public void testCentralEndpoint() throws Throwable {
     describe("Create a client with the central endpoint");
     Configuration conf = getConfiguration();
 
-    S3Client client = createS3Client(conf, CENTRAL_ENDPOINT, null, US_EAST_1, 
false);
+    S3Client client = createS3Client(conf, CENTRAL_ENDPOINT, null, US_EAST_2, 
false);
+
+    expectInterceptorException(client);
+  }
+
+  @Test
+  public void testCentralEndpointWithRegion() throws Throwable {
+    describe("Create a client with the central endpoint but also specify 
region");
+    Configuration conf = getConfiguration();
+
+    S3Client client = createS3Client(conf, CENTRAL_ENDPOINT, US_WEST_2, 
US_EAST_2, false);

Review Comment:
   for example here, if configured region is US_WEST_2, expected region should 
also be US_EAST_2





> AWS SDK V2 - Update S3A region logic 
> -------------------------------------
>
>                 Key: HADOOP-19044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19044
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.4.0
>            Reporter: Ahmar Suhail
>            Assignee: Viraj Jasani
>            Priority: Major
>              Labels: pull-request-available
>
> If both fs.s3a.endpoint & fs.s3a.endpoint.region are empty, Spark will set 
> fs.s3a.endpoint to 
> s3.amazonaws.com here:
> [https://github.com/apache/spark/blob/9a2f39318e3af8b3817dc5e4baf52e548d82063c/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L540]
>  
>  
> HADOOP-18908, updated the region logic such that if fs.s3a.endpoint.region is 
> set, or if a region can be parsed from fs.s3a.endpoint (which will happen in 
> this case, region will be US_EAST_1), cross region access is not enabled. 
> This will cause 400 errors if the bucket is not in US_EAST_1. 
>  
> Proposed: Updated the logic so that if the endpoint is the global 
> s3.amazonaws.com , cross region access is enabled.  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to