[ 
https://issues.apache.org/jira/browse/HADOOP-18565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17701292#comment-17701292
 ] 

ASF GitHub Bot commented on HADOOP-18565:
-----------------------------------------

steveloughran commented on code in PR #5421:
URL: https://github.com/apache/hadoop/pull/5421#discussion_r1138885312


##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/NetworkBinding.java:
##########
@@ -23,9 +23,9 @@
 import java.net.URI;
 import java.net.URISyntaxException;
 
-import com.amazonaws.ClientConfiguration;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
+import software.amazon.awssdk.http.apache.ApacheHttpClient;

Review Comment:
   is this going to break when the hadoop-aws module is running against an 
unshaded version of the SDK? as the reason this class does some reflection 
games already is downgrade in this situation ...you can't switch to wildfly and 
openssl, but you can at least work



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/auth/SignerFactory.java:
##########
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.auth;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import software.amazon.awssdk.auth.signer.Aws4Signer;
+import software.amazon.awssdk.auth.signer.Aws4UnsignedPayloadSigner;
+import software.amazon.awssdk.auth.signer.AwsS3V4Signer;
+import software.amazon.awssdk.core.signer.NoOpSigner;
+import software.amazon.awssdk.core.signer.Signer;
+
+import java.io.IOException;

Review Comment:
   nit: make the java imports the topmost block of imports
   



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:
##########
@@ -834,58 +834,28 @@ protected static S3AStorageStatistics 
createStorageStatistics(
    * @throws UnknownStoreException the bucket is absent
    * @throws IOException any other problem talking to S3
    */
-  // TODO: Review: this used to call doesBucketExist in v1, which does not 
check permissions,
-  //  not even read access.
   @Retries.RetryTranslated
   protected void verifyBucketExists() throws UnknownStoreException, 
IOException {
     if (!invoker.retry("doesBucketExist", bucket, true,
         trackDurationOfOperation(getDurationTrackerFactory(), 
STORE_EXISTS_PROBE.getSymbol(),
             () -> {
               try {
+                if (bucketRegions.containsKey(bucket)) {
+                  return true;
+                }
                 
s3Client.headBucket(HeadBucketRequest.builder().bucket(bucket).build());
                 return true;
-              } catch (NoSuchBucketException e) {
-                return false;
-              }
-            }))) {
-      throw new UnknownStoreException("s3a://" + bucket + "/", " Bucket does " 
+ "not exist");
-    }
-  }
-
-  /**
-   * Verify that the bucket exists. This will correctly throw an exception
-   * when credentials are invalid.
-   * TODO: Review. May be redundant in v2.
-   * Retry policy: retrying, translated.
-   * @throws UnknownStoreException the bucket is absent
-   * @throws IOException any other problem talking to S3
-   */
-  @Retries.RetryTranslated
-  protected void verifyBucketExistsV2()
-      throws UnknownStoreException, IOException {
-    if (!invoker.retry("doesBucketExistV2", bucket, true,
-        trackDurationOfOperation(getDurationTrackerFactory(),
-            STORE_EXISTS_PROBE.getSymbol(),
-            () -> {
-              // Bug in SDK always returns `true` for AccessPoint ARNs with 
`doesBucketExistV2()`
-              // expanding implementation to use ARNs and buckets correctly
-              try {
-                s3Client.getBucketAcl(GetBucketAclRequest.builder()
-                    .bucket(bucket)
-                    .build());
               } catch (AwsServiceException ex) {
                 int statusCode = ex.statusCode();
                 if (statusCode == SC_404_NOT_FOUND ||
-                    (statusCode == SC_403_FORBIDDEN &&
-                        ex.getMessage().contains(AP_INACCESSIBLE))) {
+                    (statusCode == SC_403_FORBIDDEN && accessPoint != null)) {
                   return false;
                 }
               }
 
               return true;
             }))) {
-      throw new UnknownStoreException("s3a://" + bucket + "/", " Bucket does "
-          + "not exist");
+      throw new UnknownStoreException("s3a://" + bucket + "/", " Bucket does " 
+ "not exist");

Review Comment:
   thought. should the fs.s3a.endpoint value be included, as trying to talk to 
an on-prem store without setting that value is not entirely uncommon?



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:
##########
@@ -834,58 +834,28 @@ protected static S3AStorageStatistics 
createStorageStatistics(
    * @throws UnknownStoreException the bucket is absent
    * @throws IOException any other problem talking to S3
    */
-  // TODO: Review: this used to call doesBucketExist in v1, which does not 
check permissions,
-  //  not even read access.
   @Retries.RetryTranslated
   protected void verifyBucketExists() throws UnknownStoreException, 
IOException {
     if (!invoker.retry("doesBucketExist", bucket, true,
         trackDurationOfOperation(getDurationTrackerFactory(), 
STORE_EXISTS_PROBE.getSymbol(),

Review Comment:
   we don't audit this. it should be invoked via `trackDurationAndSpan()` so we 
can track these calls in the s3 logs



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/AWSClientConfig.java:
##########
@@ -136,7 +143,7 @@ public static NettyNioAsyncHttpClient.Builder 
createAsyncHttpClientBuilder(Confi
     httpClientBuilder.readTimeout(Duration.ofSeconds(socketTimeout));
     httpClientBuilder.writeTimeout(Duration.ofSeconds(socketTimeout));
 
-    // TODO: Need to set ssl socket factory, as done in
+    // TODO: Don't think you can set a socket factory for the netty client.

Review Comment:
   oh well. why not create a follouwp jira which could be looked at in future.



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/HeaderProcessing.java:
##########
@@ -276,9 +277,28 @@ private Map<String, byte[]> retrieveHeaders(
       final Statistic statistic) throws IOException {
     StoreContext context = getStoreContext();
     String objectKey = context.pathToKey(path);
-    HeadObjectResponse md;
     String symbol = statistic.getSymbol();
     S3AStatisticsContext instrumentation = context.getInstrumentation();
+    Map<String, byte[]> headers = new TreeMap<>();
+    HeadObjectResponse md;
+
+    // Attempting to get metadata for the root, so use head bucket.
+    if (objectKey.isEmpty()) {

Review Comment:
   never thought about this. is this done in a test? i know getFileStatus(/) 
just makes up a status without talking to the store. 
   returning headers on a getXAttr call would actually be useful if it returned 
interesting headers. 



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/InconsistentS3ClientFactory.java:
##########
@@ -45,7 +46,7 @@ public class InconsistentS3ClientFactory extends 
DefaultS3ClientFactory {
 

Review Comment:
   i think we could just cut this class; it was there for generating 
inconsistencies more reliably. mockito is better for managed fault injection.
   
   proposed; do it as a followup



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/HeaderProcessing.java:
##########
@@ -276,9 +277,28 @@ private Map<String, byte[]> retrieveHeaders(
       final Statistic statistic) throws IOException {
     StoreContext context = getStoreContext();
     String objectKey = context.pathToKey(path);
-    HeadObjectResponse md;
     String symbol = statistic.getSymbol();
     S3AStatisticsContext instrumentation = context.getInstrumentation();
+    Map<String, byte[]> headers = new TreeMap<>();
+    HeadObjectResponse md;
+
+    // Attempting to get metadata for the root, so use head bucket.
+    if (objectKey.isEmpty()) {
+      HeadBucketResponse headBucketResponse =
+          trackDuration(instrumentation, symbol, () -> 
callbacks.getBucketMetadata());
+
+      if (headBucketResponse.sdkHttpResponse() != null
+          && headBucketResponse.sdkHttpResponse().headers() != null
+          && 
headBucketResponse.sdkHttpResponse().headers().get("Content-Type") != null) {

Review Comment:
   make a constant for this; the class did use 
com.amazonaws.services.s3.Headers for it in the past. I don't see any standard 
set of headers declared in hadoop-common, so just do it in this file



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AUtils.java:
##########
@@ -807,10 +742,8 @@ private static AWSCredentialsProvider 
createAWSV1CredentialProvider(
    * @return the instantiated class
    * @throws IOException on any instantiation failure.
    */
-  private static AwsCredentialsProvider createAWSV2CredentialProvider(
-      Configuration conf,
-      Class<?> credClass,
-      @Nullable URI uri) throws IOException {
+  private static AwsCredentialsProvider 
createAWSV2CredentialProvider(Configuration conf,

Review Comment:
   how about moving this credential stuff into its own class under s3a.auth? 
its complicated enough to merit it, and pulling it out should make maintenance 
easier as other changes to S3AUtils won't interfere



##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/delegation/ITestSessionDelegationInFileystem.java:
##########
@@ -330,10 +330,9 @@ public void testDelegatedFileSystem() throws Throwable {
         + " if role restricted, permissions are tightened.");
     S3AFileSystem fs = getFileSystem();
     // force a probe of the remote FS to make sure its endpoint is valid
-    // TODO: Previously a call to getObjectMetadata for a base path, ie with 
an empty key would
-    //  return some metadata. (bucket region, content type). headObject() 
fails without a key, check
-    // how this can be fixed.
-    // fs.getObjectMetadata(new Path("/"));
+    // TODO: Check what should happen here. Calling headObject() on the root 
path fails in V2,

Review Comment:
   let's not worry about it if it is too hard to do



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:
##########
@@ -725,18 +728,15 @@ private void doBucketProbing() throws IOException {
       LOG.debug("skipping check for bucket existence");
       break;
     case 1:
-      logDnsLookup(getConf());
-      verifyBucketExists();
-      break;
     case 2:
       logDnsLookup(getConf());
-      verifyBucketExistsV2();
+      verifyBucketExists();
       break;
     default:
       // we have no idea what this is, assume it is from a later release.
       LOG.warn("Unknown bucket probe option {}: {}; falling back to check #2",

Review Comment:
   1. text is out of date
   2. how about we fall back to "not checking"? is is the current default after 
all



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:
##########
@@ -987,12 +963,81 @@ private void bindAWSClient(URI name, boolean dtEnabled) 
throws IOException {
         .withRequesterPays(conf.getBoolean(ALLOW_REQUESTER_PAYS, 
DEFAULT_ALLOW_REQUESTER_PAYS))
         .withExecutionInterceptors(auditManager.createExecutionInterceptors())
         .withMinimumPartSize(partSize)
-        .withTransferManagerExecutor(unboundedThreadPool);
+        .withTransferManagerExecutor(unboundedThreadPool)
+        .withRegion(region);
 
     S3ClientFactory clientFactory = 
ReflectionUtils.newInstance(s3ClientFactoryClass, conf);
-    s3Client = clientFactory.createS3ClientV2(getUri(), parameters);
+    s3Client = clientFactory.createS3Client(getUri(), parameters);
     s3AsyncClient = clientFactory.createS3AsyncClient(getUri(), parameters);
-    transferManager =  clientFactory.createS3TransferManager(getUri(), 
parameters);
+    transferManager =  clientFactory.createS3TransferManager(s3AsyncClient);
+  }
+
+  /**
+   * Get the bucket region.
+   *
+   * @param region AWS S3 Region set in the config. This property may not be 
set, in which case
+   *               ask S3 for the region.
+   * @return region of the bucket.
+   */
+  private Region getS3Region(String region) throws IOException {
+
+    if (!StringUtils.isBlank(region)) {
+      return Region.of(region);
+    }
+
+    Region cachedRegion = bucketRegions.get(bucket);
+
+    if (cachedRegion != null) {
+      LOG.debug("Got region {} for bucket {} from cache", cachedRegion, 
bucket);
+      return cachedRegion;
+    }
+
+    Region s3Region = trackDurationAndSpan(STORE_REGION_PROBE, bucket, null,
+        () -> invoker.retry("getS3Region", bucket, true, () -> {
+          try {
+
+            LOG.warn(

Review Comment:
   use a LogOnce to keep the noise down



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:
##########
@@ -987,12 +963,81 @@ private void bindAWSClient(URI name, boolean dtEnabled) 
throws IOException {
         .withRequesterPays(conf.getBoolean(ALLOW_REQUESTER_PAYS, 
DEFAULT_ALLOW_REQUESTER_PAYS))
         .withExecutionInterceptors(auditManager.createExecutionInterceptors())
         .withMinimumPartSize(partSize)
-        .withTransferManagerExecutor(unboundedThreadPool);
+        .withTransferManagerExecutor(unboundedThreadPool)
+        .withRegion(region);
 
     S3ClientFactory clientFactory = 
ReflectionUtils.newInstance(s3ClientFactoryClass, conf);
-    s3Client = clientFactory.createS3ClientV2(getUri(), parameters);
+    s3Client = clientFactory.createS3Client(getUri(), parameters);
     s3AsyncClient = clientFactory.createS3AsyncClient(getUri(), parameters);
-    transferManager =  clientFactory.createS3TransferManager(getUri(), 
parameters);
+    transferManager =  clientFactory.createS3TransferManager(s3AsyncClient);
+  }
+
+  /**
+   * Get the bucket region.
+   *
+   * @param region AWS S3 Region set in the config. This property may not be 
set, in which case
+   *               ask S3 for the region.
+   * @return region of the bucket.
+   */
+  private Region getS3Region(String region) throws IOException {

Review Comment:
   is an elegant way where the bucket region cache/lookup could be implemented 
in a self contained class in the impl package -we are just always strugglying 
to keep the size of S3AFileSystem class under control for manageability and 
maintenance. 
   
   imagine some RegionResolver interface with the implementation given the 
config and credentials and talking to s3. the mock classes could then just have 
an implementation which made up a region. It would also allow for testing of 
the caching mechanism,
   this could be done as a followup to this PR here; extend 
ExecutingStoreOperation with a new op



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:
##########
@@ -435,6 +436,8 @@ public class S3AFileSystem extends FileSystem implements 
StreamCapabilities,
    */
   private final Set<Path> deleteOnExit = new TreeSet<>();
 
+  private final Map<String, Region> bucketRegions = new HashMap<>();

Review Comment:
   this would need to be static to work



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/auth/SignerFactory.java:
##########
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.auth;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import software.amazon.awssdk.auth.signer.Aws4Signer;
+import software.amazon.awssdk.auth.signer.Aws4UnsignedPayloadSigner;
+import software.amazon.awssdk.auth.signer.AwsS3V4Signer;
+import software.amazon.awssdk.core.signer.NoOpSigner;
+import software.amazon.awssdk.core.signer.Signer;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Map;
+
+import org.apache.hadoop.fs.s3a.S3AUtils;
+
+
+/**
+ * Signer factory used to register and create signers.
+ */
+public final class SignerFactory {

Review Comment:
   worried about thread safety here as the map isn't concurrent and there are 
public methods to update it. ConcurrentHashMap would be safer



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/auth/SignerFactory.java:
##########
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.auth;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import software.amazon.awssdk.auth.signer.Aws4Signer;
+import software.amazon.awssdk.auth.signer.Aws4UnsignedPayloadSigner;
+import software.amazon.awssdk.auth.signer.AwsS3V4Signer;
+import software.amazon.awssdk.core.signer.NoOpSigner;
+import software.amazon.awssdk.core.signer.Signer;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Map;
+
+import org.apache.hadoop.fs.s3a.S3AUtils;
+
+
+/**
+ * Signer factory used to register and create signers.
+ */
+public final class SignerFactory {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(SignerFactory.class);
+  public static final String VERSION_FOUR_SIGNER = "AWS4SignerType";
+  public static final String VERSION_FOUR_UNSIGNED_PAYLOAD_SIGNER = 
"AWS4UnsignedPayloadSignerType";
+  public static final String NO_OP_SIGNER = "NoOpSignerType";
+  private static final String S3_V4_SIGNER = "AWSS3V4SignerType";
+
+  private static final Map<String, Class<? extends Signer>> SIGNERS
+      = new HashMap<>();
+
+  static {
+    // Register the standard signer types.
+    SIGNERS.put(VERSION_FOUR_SIGNER, Aws4Signer.class);
+    SIGNERS.put(VERSION_FOUR_UNSIGNED_PAYLOAD_SIGNER, 
Aws4UnsignedPayloadSigner.class);
+    SIGNERS.put(NO_OP_SIGNER, NoOpSigner.class);
+    SIGNERS.put(S3_V4_SIGNER, AwsS3V4Signer.class);
+  }
+
+
+  private SignerFactory() {
+  }
+
+  /**
+   * Register an implementation class for the given signer type.
+   *
+   * @param signerType  The name of the signer type to register.
+   * @param signerClass The class implementing the given signature protocol.
+   */
+  public static void registerSigner(
+      final String signerType,
+      final Class<? extends Signer> signerClass) {
+
+    if (signerType == null) {
+      throw new IllegalArgumentException("signerType cannot be null");
+    }
+    if (signerClass == null) {
+      throw new IllegalArgumentException("signerClass cannot be null");
+    }
+
+    SIGNERS.put(signerType, signerClass);
+  }
+
+  /**
+   * Check if the signer has already been registered.
+   * @param signerType signer to get
+   */
+  public static void getSigner(String signerType) {

Review Comment:
   prefer a title like "verify signer registered" as get* methods usually 
return something



##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3ACannedACLs.java:
##########
@@ -57,8 +57,6 @@ protected Configuration createConfiguration() {
     Configuration conf = super.createConfiguration();
     removeBaseAndBucketOverrides(conf,
         CANNED_ACL);
-    // TODO: Check why we need this ACL? V2 does not have a LOG_DELIVERY_WRITE 
ACL which causes

Review Comment:
   just needed some acl to set to make sure we were dong it properly; look at 
the jira HADOOP-17822



##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AProxy.java:
##########
@@ -20,18 +20,17 @@
 
 import java.io.IOException;
 
-import com.amazonaws.ClientConfiguration;
-import com.amazonaws.Protocol;
 import org.assertj.core.api.Assertions;
 import org.junit.Test;
+import software.amazon.awssdk.http.apache.ProxyConfiguration;
 
 import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.s3a.impl.AWSClientConfig;
 import org.apache.hadoop.test.AbstractHadoopTestBase;
 
 import static org.apache.hadoop.fs.s3a.Constants.PROXY_HOST;
 import static org.apache.hadoop.fs.s3a.Constants.PROXY_PORT;
 import static org.apache.hadoop.fs.s3a.Constants.PROXY_SECURED;
-import static org.apache.hadoop.fs.s3a.S3AUtils.initProxySupport;
 
 /**

Review Comment:
   this is fixed now, right?



##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AEndpointRegion.java:
##########
@@ -21,137 +21,138 @@
 import java.io.IOException;
 import java.net.URI;
 import java.net.URISyntaxException;
+import java.util.ArrayList;
+import java.util.List;
 
-import com.amazonaws.ClientConfiguration;
-import com.amazonaws.client.builder.AwsClientBuilder;
-import com.amazonaws.services.s3.AmazonS3;
-import com.amazonaws.util.AwsHostNameUtils;
 import org.assertj.core.api.Assertions;
+import org.junit.Assert;
 import org.junit.Test;
+import software.amazon.awssdk.awscore.AwsExecutionAttribute;
+import software.amazon.awssdk.awscore.exception.AwsServiceException;
+import software.amazon.awssdk.core.interceptor.Context;
+import software.amazon.awssdk.core.interceptor.ExecutionAttributes;
+import software.amazon.awssdk.core.interceptor.ExecutionInterceptor;
+import software.amazon.awssdk.services.s3.S3Client;
+import software.amazon.awssdk.services.s3.model.HeadBucketRequest;
+import software.amazon.awssdk.services.s3.model.S3Exception;
 
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.s3a.statistics.impl.EmptyS3AStatisticsContext;
 
 import static org.apache.hadoop.fs.s3a.Constants.AWS_REGION;
-import static org.apache.hadoop.fs.s3a.Constants.AWS_S3_CENTRAL_REGION;
-import static org.apache.hadoop.fs.s3a.Constants.DEFAULT_ENDPOINT;
-import static 
org.apache.hadoop.fs.s3a.impl.InternalConstants.AWS_REGION_SYSPROP;
-import static org.apache.hadoop.test.LambdaTestUtils.intercept;
+import static org.apache.hadoop.fs.s3a.Constants.BUCKET_REGION_HEADER;
+import static org.apache.hadoop.fs.s3a.Statistic.STORE_REGION_PROBE;
+import static 
org.apache.hadoop.fs.s3a.impl.InternalConstants.SC_301_MOVED_PERMANENTLY;
 
 /**
  * Test to check correctness of S3A endpoint regions in
  * {@link DefaultS3ClientFactory}.
  */
 public class ITestS3AEndpointRegion extends AbstractS3ATestBase {
 
-  private static final String AWS_REGION_TEST = "test-region";
   private static final String AWS_ENDPOINT_TEST = "test-endpoint";
-  private static final String AWS_ENDPOINT_TEST_WITH_REGION =
-      "test-endpoint.some-region.amazonaws.com";
-  public static final String MARS_NORTH_2 = "mars-north-2";
 
-  /**
-   * Test to verify that setting a region with the config would bypass the
-   * construction of region from endpoint.
-   */
-  @Test
-  public void testWithRegionConfig() {
-    getFileSystem().getConf().set(AWS_REGION, AWS_REGION_TEST);
-
-    //Creating an endpoint config with a custom endpoint.
-    AwsClientBuilder.EndpointConfiguration epr = createEpr(AWS_ENDPOINT_TEST,
-        getFileSystem().getConf().getTrimmed(AWS_REGION));
-    //Checking if setting region config bypasses the endpoint region.
-    Assertions.assertThat(epr.getSigningRegion())
-        .describedAs("There is a region mismatch")
-        .isEqualTo(getFileSystem().getConf().get(AWS_REGION));
-  }
 
   /**
-   * Test to verify that not setting the region config, would lead to using
-   * endpoint to construct the region.
+   * Test to verify that not setting the region config, will lead to the 
client factory making
+   * a HEAD bucket call to configure the correct region. If an incorrect 
region is set, the HEAD
+   * bucket call in this test will raise an exception.
    */
   @Test
-  public void testWithoutRegionConfig() {
-    getFileSystem().getConf().unset(AWS_REGION);
-
-    //Creating an endpoint config with a custom endpoint containing a region.
-    AwsClientBuilder.EndpointConfiguration eprRandom =
-        createEpr(AWS_ENDPOINT_TEST_WITH_REGION,
-            getFileSystem().getConf().getTrimmed(AWS_REGION));
-    String regionFromEndpoint =
-        AwsHostNameUtils
-            .parseRegionFromAwsPartitionPattern(AWS_ENDPOINT_TEST_WITH_REGION);
-    //Checking if not setting region config leads to constructing the region
-    // from endpoint.
-    Assertions.assertThat(eprRandom.getSigningRegion())
-        .describedAs("There is a region mismatch")
-        .isNotEqualTo(getFileSystem().getConf().get(AWS_REGION))
-        .isEqualTo(regionFromEndpoint);
-  }
+  public void testWithoutRegionConfig() throws IOException {
+    Configuration conf = getConfiguration();
+    String bucket = getFileSystem().getBucket();
+    conf.unset(String.format("fs.s3a.bucket.%s.endpoint.region", bucket));
+    conf.unset(AWS_REGION);
 
-  /**
-   * Method to create EndpointConfiguration using an endpoint.
-   *
-   * @param endpoint the endpoint to be used for EndpointConfiguration 
creation.
-   * @return an instance of EndpointConfiguration.
-   */
-  private AwsClientBuilder.EndpointConfiguration createEpr(String endpoint,
-      String awsRegion) {
-    return DefaultS3ClientFactory.createEndpointConfiguration(endpoint,
-        new ClientConfiguration(), awsRegion);
+    S3AFileSystem fs = new S3AFileSystem();
+    fs.initialize(getFileSystem().getUri(), conf);
+
+    try  {
+      fs.getBucketMetadata();
+    } catch (S3Exception exception) {
+      if (exception.statusCode() == SC_301_MOVED_PERMANENTLY) {
+        List<String> bucketRegion =
+            
exception.awsErrorDetails().sdkHttpResponse().headers().get(BUCKET_REGION_HEADER);
+        Assert.fail("Region not configured correctly, expected region: " + 
bucketRegion);
+      }
+    }
+
+    
Assertions.assertThat(fs.getInstrumentation().getCounterValue(STORE_REGION_PROBE))
+        .describedAs("Region is not configured, region probe should have been 
made").isEqualTo(1);
   }
 
 
   @Test
-  public void testInvalidRegionDefaultEndpoint() throws Throwable {
-    describe("Create a client with an invalid region and the default 
endpoint");
+  public void testWithRegionConfig() throws IOException, URISyntaxException {
     Configuration conf = getConfiguration();
-    // we are making a big assumption about the timetable for AWS
-    // region rollout.
-    // if this test ever fails because this region now exists
-    // -congratulations!
-    conf.set(AWS_REGION, MARS_NORTH_2);
-    createMarsNorth2Client(conf);
+    conf.set(AWS_REGION, "us-east-2");
+
+    S3AFileSystem fs = new S3AFileSystem();
+    fs.initialize(new URI("s3a://landsat-pds"), conf);
+
+    
Assertions.assertThat(fs.getInstrumentation().getCounterValue(STORE_REGION_PROBE))
+        .describedAs("Region is configured, region probe should not have been 
made").isEqualTo(0);
   }
 
   @Test
-  public void testUnsetRegionDefaultEndpoint() throws Throwable {
-    describe("Create a client with no region and the default endpoint");
+  public void testRegionCache() throws IOException, URISyntaxException {
     Configuration conf = getConfiguration();
     conf.unset(AWS_REGION);
-    createS3Client(conf, DEFAULT_ENDPOINT, AWS_S3_CENTRAL_REGION);
+    conf.unset("fs.s3a.bucket.landsat-pds.endpoint.region");
+    S3AFileSystem fs = new S3AFileSystem();
+
+    fs.initialize(new URI("s3a://landsat-pds"), conf);
+
+    
Assertions.assertThat(fs.getInstrumentation().getCounterValue(STORE_REGION_PROBE))
+        .describedAs("Incorrect number of calls made to get bucket 
region").isEqualTo(1);
+
+    fs.initialize(new URI("s3a://landsat-pds"), conf);
+
+    // value should already be cached.
+    
Assertions.assertThat(fs.getInstrumentation().getCounterValue(STORE_REGION_PROBE))
+        .describedAs("Incorrect number of calls made to get bucket 
region").isEqualTo(0);
   }
 
-  /**
-   * By setting the system property {@code "aws.region"} we can
-   * guarantee that the SDK region resolution chain will always succeed
-   * (and fast).
-   * Clearly there is no validation of the region during the build process.
-   */
   @Test
-  public void testBlankRegionTriggersSDKResolution() throws Throwable {
-    describe("Create a client with a blank region and the default endpoint."
-        + " This will trigger the SDK Resolution chain");
+  public void testEndpointOverride() throws Throwable {
+    describe("Create a client with no region and the default endpoint");
     Configuration conf = getConfiguration();
-    conf.set(AWS_REGION, "");
-    System.setProperty(AWS_REGION_SYSPROP, MARS_NORTH_2);
+
+    S3Client client = createS3Client(conf, AWS_ENDPOINT_TEST);
+
     try {
-      createMarsNorth2Client(conf);
-    } finally {
-      System.clearProperty(AWS_REGION_SYSPROP);
+      
client.headBucket(HeadBucketRequest.builder().bucket(getFileSystem().getBucket()).build());
+    } catch (AwsServiceException exception) {
+      // Expected to be thrown by interceptor, do nothing.

Review Comment:
   if you use intercept() an exception is raised if the exception isn't 
actually thrown. do you want that?



##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AEndpointRegion.java:
##########
@@ -21,137 +21,138 @@
 import java.io.IOException;
 import java.net.URI;
 import java.net.URISyntaxException;
+import java.util.ArrayList;
+import java.util.List;
 
-import com.amazonaws.ClientConfiguration;
-import com.amazonaws.client.builder.AwsClientBuilder;
-import com.amazonaws.services.s3.AmazonS3;
-import com.amazonaws.util.AwsHostNameUtils;
 import org.assertj.core.api.Assertions;
+import org.junit.Assert;
 import org.junit.Test;
+import software.amazon.awssdk.awscore.AwsExecutionAttribute;
+import software.amazon.awssdk.awscore.exception.AwsServiceException;
+import software.amazon.awssdk.core.interceptor.Context;
+import software.amazon.awssdk.core.interceptor.ExecutionAttributes;
+import software.amazon.awssdk.core.interceptor.ExecutionInterceptor;
+import software.amazon.awssdk.services.s3.S3Client;
+import software.amazon.awssdk.services.s3.model.HeadBucketRequest;
+import software.amazon.awssdk.services.s3.model.S3Exception;
 
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.s3a.statistics.impl.EmptyS3AStatisticsContext;
 
 import static org.apache.hadoop.fs.s3a.Constants.AWS_REGION;
-import static org.apache.hadoop.fs.s3a.Constants.AWS_S3_CENTRAL_REGION;
-import static org.apache.hadoop.fs.s3a.Constants.DEFAULT_ENDPOINT;
-import static 
org.apache.hadoop.fs.s3a.impl.InternalConstants.AWS_REGION_SYSPROP;
-import static org.apache.hadoop.test.LambdaTestUtils.intercept;
+import static org.apache.hadoop.fs.s3a.Constants.BUCKET_REGION_HEADER;
+import static org.apache.hadoop.fs.s3a.Statistic.STORE_REGION_PROBE;
+import static 
org.apache.hadoop.fs.s3a.impl.InternalConstants.SC_301_MOVED_PERMANENTLY;
 
 /**
  * Test to check correctness of S3A endpoint regions in
  * {@link DefaultS3ClientFactory}.
  */
 public class ITestS3AEndpointRegion extends AbstractS3ATestBase {
 
-  private static final String AWS_REGION_TEST = "test-region";
   private static final String AWS_ENDPOINT_TEST = "test-endpoint";
-  private static final String AWS_ENDPOINT_TEST_WITH_REGION =
-      "test-endpoint.some-region.amazonaws.com";
-  public static final String MARS_NORTH_2 = "mars-north-2";
 
-  /**
-   * Test to verify that setting a region with the config would bypass the
-   * construction of region from endpoint.
-   */
-  @Test
-  public void testWithRegionConfig() {
-    getFileSystem().getConf().set(AWS_REGION, AWS_REGION_TEST);
-
-    //Creating an endpoint config with a custom endpoint.
-    AwsClientBuilder.EndpointConfiguration epr = createEpr(AWS_ENDPOINT_TEST,
-        getFileSystem().getConf().getTrimmed(AWS_REGION));
-    //Checking if setting region config bypasses the endpoint region.
-    Assertions.assertThat(epr.getSigningRegion())
-        .describedAs("There is a region mismatch")
-        .isEqualTo(getFileSystem().getConf().get(AWS_REGION));
-  }
 
   /**
-   * Test to verify that not setting the region config, would lead to using
-   * endpoint to construct the region.
+   * Test to verify that not setting the region config, will lead to the 
client factory making
+   * a HEAD bucket call to configure the correct region. If an incorrect 
region is set, the HEAD
+   * bucket call in this test will raise an exception.
    */
   @Test
-  public void testWithoutRegionConfig() {
-    getFileSystem().getConf().unset(AWS_REGION);
-
-    //Creating an endpoint config with a custom endpoint containing a region.
-    AwsClientBuilder.EndpointConfiguration eprRandom =
-        createEpr(AWS_ENDPOINT_TEST_WITH_REGION,
-            getFileSystem().getConf().getTrimmed(AWS_REGION));
-    String regionFromEndpoint =
-        AwsHostNameUtils
-            .parseRegionFromAwsPartitionPattern(AWS_ENDPOINT_TEST_WITH_REGION);
-    //Checking if not setting region config leads to constructing the region
-    // from endpoint.
-    Assertions.assertThat(eprRandom.getSigningRegion())
-        .describedAs("There is a region mismatch")
-        .isNotEqualTo(getFileSystem().getConf().get(AWS_REGION))
-        .isEqualTo(regionFromEndpoint);
-  }
+  public void testWithoutRegionConfig() throws IOException {
+    Configuration conf = getConfiguration();
+    String bucket = getFileSystem().getBucket();
+    conf.unset(String.format("fs.s3a.bucket.%s.endpoint.region", bucket));
+    conf.unset(AWS_REGION);
 
-  /**
-   * Method to create EndpointConfiguration using an endpoint.
-   *
-   * @param endpoint the endpoint to be used for EndpointConfiguration 
creation.
-   * @return an instance of EndpointConfiguration.
-   */
-  private AwsClientBuilder.EndpointConfiguration createEpr(String endpoint,
-      String awsRegion) {
-    return DefaultS3ClientFactory.createEndpointConfiguration(endpoint,
-        new ClientConfiguration(), awsRegion);
+    S3AFileSystem fs = new S3AFileSystem();
+    fs.initialize(getFileSystem().getUri(), conf);
+
+    try  {
+      fs.getBucketMetadata();
+    } catch (S3Exception exception) {
+      if (exception.statusCode() == SC_301_MOVED_PERMANENTLY) {
+        List<String> bucketRegion =
+            
exception.awsErrorDetails().sdkHttpResponse().headers().get(BUCKET_REGION_HEADER);
+        Assert.fail("Region not configured correctly, expected region: " + 
bucketRegion);

Review Comment:
   what about including the full toString() of the exception





> AWS SDK V2 - Complete outstanding items
> ---------------------------------------
>
>                 Key: HADOOP-18565
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18565
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.4.0
>            Reporter: Ahmar Suhail
>            Priority: Major
>              Labels: pull-request-available
>
> The following work remains to complete the SDK upgrade work:
>  * S3A allows users configure to custom signers, add in support for this.
>  * Remove SDK V1 bundle dependency
>  * Update `getRegion()` logic to use retries. 
>  * Add in progress listeners for `S3ABlockOutputStream`
>  * Fix any failing tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


Reply via email to