steveloughran commented on code in PR #7443:
URL: https://github.com/apache/hadoop/pull/7443#discussion_r1988945094


##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/DefaultS3ClientFactory.java:
##########
@@ -145,7 +123,17 @@ public S3Client createS3Client(
   }
 
   @Override
-  public S3AsyncClient createS3AsyncClient(
+  public S3AsyncClient createS3AsyncClient(final URI uri,
+      final S3ClientCreationParameters parameters) throws IOException {
+    if (parameters.isCrtEnabled()) {
+      LOG_S3_CRT_ENABLED.info("The S3 CRT client is enabled");

Review Comment:
   why not at debug? if everything works. no need to print anything



##########
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md:
##########
@@ -1767,6 +1767,48 @@ To disable checksum verification in `distcp`, use the 
`-skipcrccheck` option:
 hadoop distcp -update -skipcrccheck -numListstatusThreads 40 
/user/alice/datasets s3a://alice-backup/datasets
 ```
 
+### <a name="distcp"></a> Using the S3 CRT client

Review Comment:
   pull out all crtclient stuff into its own doc



##########
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md:
##########
@@ -1767,6 +1767,48 @@ To disable checksum verification in `distcp`, use the 
`-skipcrccheck` option:
 hadoop distcp -update -skipcrccheck -numListstatusThreads 40 
/user/alice/datasets s3a://alice-backup/datasets
 ```
 
+### <a name="distcp"></a> Using the S3 CRT client
+
+The [AWS CRT-based S3 
client](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/crt-based-s3-client.html)
 
+is built on top of the AWS Common Runtime (CRT), is an alternative S3 
asynchronous client. It has
+enhanced connection pool management, and can provide higher transfer from and 
to S3 due to its 

Review Comment:
   higher transfer what? bandwidth, latency, support calls?. 
   
   also, s3a block output stream splits PUT requests, and vector io/analytics 
does the GET stuff, so what else does it offer? load balancing?



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/AWSRegionEndpointInformation.java:
##########
@@ -0,0 +1,70 @@
+package org.apache.hadoop.fs.s3a.impl;
+

Review Comment:
   Pulling out is good.
   
   we need to fix that region resolution logic
   
   * work in ec2/k8s without going out of region -automatically
   * add an option to use the default chain without the "add a space" trick, 
e.g a region called "(sdk)". Not backwards compatible, but makes clear what 
happens.
   * 



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/RequestFactoryImpl.java:
##########
@@ -356,7 +362,7 @@ public PutObjectRequest.Builder 
newPutObjectRequestBuilder(String key,
     }
 
     // Set the timeout for object uploads but not directory markers.
-    if (!isDirectoryMarker) {
+    if (!isDirectoryMarker && !isCRTEnabled) {

Review Comment:
   needs to go into the docs then



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/AWSRegionEndpointResolver.java:
##########
@@ -0,0 +1,232 @@
+/*

Review Comment:
   this stuff is getting to complicated and the endpoint/region code a 
significant source of pain. Isolation, more tests etc would be good.



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:
##########
@@ -474,6 +474,11 @@ public class S3AFileSystem extends FileSystem implements 
StreamCapabilities,
    */
   private boolean fipsEnabled;
 
+  /**
+   * Is CRT enabled?
+   */
+  private boolean isCRTEnabled;

Review Comment:
   let's not add another boolean field here.
   
   In #7463 I've drafted a service under S3AStore which loads and stores config 
info, using an enumset for the flags. I'd really like to do reflection based 
loading of values, the way abfs does. Anyone want to pick up that work?



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/AWSRegionEndpointResolver.java:
##########
@@ -0,0 +1,232 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+
+package org.apache.hadoop.fs.s3a.impl;
+
+import java.net.URI;
+import java.net.URISyntaxException;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import software.amazon.awssdk.awscore.util.AwsHostNameUtils;
+import software.amazon.awssdk.regions.Region;
+
+import org.apache.hadoop.classification.VisibleForTesting;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.s3a.S3ClientFactory;
+import org.apache.hadoop.fs.store.LogExactlyOnce;
+
+import static org.apache.hadoop.fs.s3a.Constants.AWS_REGION;
+import static 
org.apache.hadoop.fs.s3a.Constants.AWS_S3_CROSS_REGION_ACCESS_ENABLED;
+import static 
org.apache.hadoop.fs.s3a.Constants.AWS_S3_CROSS_REGION_ACCESS_ENABLED_DEFAULT;
+import static org.apache.hadoop.fs.s3a.Constants.AWS_S3_DEFAULT_REGION;
+import static org.apache.hadoop.fs.s3a.Constants.CENTRAL_ENDPOINT;
+import static org.apache.hadoop.fs.s3a.Constants.DEFAULT_SECURE_CONNECTIONS;
+import static org.apache.hadoop.fs.s3a.Constants.FIPS_ENDPOINT;
+import static org.apache.hadoop.fs.s3a.Constants.SECURE_CONNECTIONS;
+import static org.apache.hadoop.util.Preconditions.checkArgument;
+
+/**
+ * This class uses Hadoop configurations to resolve endpoint and region 
information which
+ * is then set in the SDK clients.
+ */
+public class AWSRegionEndpointResolver {
+
+  private static final String S3_SERVICE_NAME = "s3";
+
+  private static final Pattern VPC_ENDPOINT_PATTERN =
+      
Pattern.compile("^(?:.+\\.)?([a-z0-9-]+)\\.vpce\\.amazonaws\\.(?:com|com\\.cn)$");
+
+  protected static final Logger LOG =
+      LoggerFactory.getLogger(AWSRegionEndpointResolver.class);
+
+  /**
+   * A one-off warning of default region chains in use.
+   */
+  private static final LogExactlyOnce WARN_OF_DEFAULT_REGION_CHAIN =
+      new LogExactlyOnce(LOG);
+
+  /**
+   * Warning message printed when the SDK Region chain is in use.
+   */
+  private static final String SDK_REGION_CHAIN_IN_USE =
+      "S3A filesystem client is using"
+          + " the SDK region resolution chain.";
+
+  /**
+   * Error message when an endpoint is set with FIPS enabled: {@value}.
+   */
+  @VisibleForTesting
+  public static final String ERROR_ENDPOINT_WITH_FIPS =
+      "Non central endpoint cannot be set when " + FIPS_ENDPOINT + " is true";
+
+
+  public static AWSRegionEndpointInformation 
getEndpointRegionResolution(S3ClientFactory.S3ClientCreationParameters 
parameters,
+      Configuration conf) {
+    final String endpointStr = parameters.getEndpoint();
+    final URI endpoint = getS3Endpoint(endpointStr, conf);
+
+    AWSRegionEndpointInformation.Builder builder = new 
AWSRegionEndpointInformation.Builder();
+
+    final String configuredRegion = parameters.getRegion();
+    Region region = null;
+    String origin = "";
+
+    // If the region was configured, set it.
+    if (configuredRegion != null && !configuredRegion.isEmpty()) {
+      origin = AWS_REGION;
+      region = Region.of(configuredRegion);
+    }
+
+    // FIPs? Log it, then reject any attempt to set an endpoint
+    final boolean fipsEnabled = parameters.isFipsEnabled();
+    if (fipsEnabled) {
+      LOG.debug("Enabling FIPS mode");
+    }
+    // always setting it guarantees the value is non-null,
+    // which tests expect.
+    builder.fipsEnabled(fipsEnabled);
+
+    if (endpoint != null) {
+      boolean endpointEndsWithCentral =
+          endpointStr.endsWith(CENTRAL_ENDPOINT);
+      checkArgument(!fipsEnabled || endpointEndsWithCentral, "%s : %s",
+          ERROR_ENDPOINT_WITH_FIPS,
+          endpoint);
+
+      // No region was configured,
+      // determine the region from the endpoint.
+      if (region == null) {
+        region = getS3RegionFromEndpoint(endpointStr,
+            endpointEndsWithCentral);
+        if (region != null) {
+          origin = "endpoint";
+        }
+      }
+
+      // No need to override endpoint with "s3.amazonaws.com".
+      // Let the client take care of endpoint resolution. Overriding
+      // the endpoint with "s3.amazonaws.com" causes 400 Bad Request
+      // errors for non-existent buckets and objects.
+      // ref: https://github.com/aws/aws-sdk-java-v2/issues/4846
+      if (!endpointEndsWithCentral) {
+        builder.withEndpoint(endpoint);
+        LOG.debug("Setting endpoint to {}", endpoint);
+      } else {
+        origin = "central endpoint with cross region access";
+        LOG.debug("Enabling cross region access for endpoint {}",
+            endpointStr);
+      }
+    }
+
+    if (region != null) {
+      builder.withRegion(region);
+    } else if (configuredRegion == null) {
+      // no region is configured, and none could be determined from the 
endpoint.
+      // Use US_EAST_2 as default.
+      region = Region.of(AWS_S3_DEFAULT_REGION);
+      builder.withRegion(region);
+      origin = "cross region access fallback";
+    } else if (configuredRegion.isEmpty()) {
+      // region configuration was set to empty string.
+      // allow this if people really want it; it is OK to rely on this
+      // when deployed in EC2.
+      WARN_OF_DEFAULT_REGION_CHAIN.warn(SDK_REGION_CHAIN_IN_USE);

Review Comment:
   lets just log at debug



##########
hadoop-tools/hadoop-aws/pom.xml:
##########
@@ -479,6 +494,11 @@
       <artifactId>bundle</artifactId>
       <scope>compile</scope>
     </dependency>
+    <dependency>
+      <groupId>software.amazon.awssdk.crt</groupId>
+      <artifactId>aws-crt</artifactId>
+      <scope>compile</scope>

Review Comment:
   provided, unless it really is to be shipped



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to