steveloughran commented on code in PR #7443:
URL: https://github.com/apache/hadoop/pull/7443#discussion_r2068647513


##########
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/crt_client.md:
##########
@@ -0,0 +1,66 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Using the S3 CRT client
+
+This document explains usage of the CRT client.
+
+The [AWS CRT-based S3 
client](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/crt-based-s3-client.html)
+is built on top of the AWS Common Runtime (CRT), is an alternative S3 
asynchronous client. It can provide higher
+transfer throughput from S3 due to its enhanced connection pool management. 
More information
+can be found
+[here](https://aws.amazon.com/blogs/developer/introducing-crt-based-s3-client-and-the-s3-transfer-manager-in-the-aws-sdk-for-java-2-x/).
+
+When making multiple parallel GET requests, using the CRT ensures load is 
evenly distributed across S3. This can be
+useful for all three input streams available with versions > 3.4.2, as:
+* The Analytics accelerator will make parallel GETs for columns it predicts 
will be required on a Parquet file open.
+* The Prefetch input stream can make up to 8 parallel 8MB GETs by default.
+* The Classic input stream will make async parallel GETs for column reads when 
using VectoredIO.
+
+Since the CRT client is an async client, when enabled in S3A, it will 
currently be used wherever
+* When reading data using the Analytics stream.
+* When copying files between buckets using the Transfer manager, for example, 
on a rename.
+* When copying from the local file system to S3.
+
+This is because all other operations in S3A currently use the S3 Sync client, 
and so cannot be replaced by an
+asynchronous client. The move to an async client is tracked in
+[HADOOP-18877](https://issues.apache.org/jira/browse/HADOOP-18877).
+
+## Enabling the CRT Client
+The CRT client can be enabled as follows:
+
+```

Review Comment:
   nit: xml



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/AWSClientConfig.java:
##########
@@ -371,14 +426,23 @@ private static URI buildURI(String scheme, String host, 
int port) {
    * @param clientConfig AWS SDK configuration to update
    */
   private static void initUserAgent(Configuration conf,
-      ClientOverrideConfiguration.Builder clientConfig) {
+     ClientOverrideConfiguration.Builder clientConfig) {
+     clientConfig.putAdvancedOption(SdkAdvancedClientOption.USER_AGENT_PREFIX, 
getUserAgent(conf));
+  }
+
+  /**
+   * Builds user agent string

Review Comment:
   nit: .



##########
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/crt_client.md:
##########
@@ -0,0 +1,66 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Using the S3 CRT client
+
+This document explains usage of the CRT client.
+
+The [AWS CRT-based S3 
client](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/crt-based-s3-client.html)
+is built on top of the AWS Common Runtime (CRT), is an alternative S3 
asynchronous client. It can provide higher
+transfer throughput from S3 due to its enhanced connection pool management. 
More information
+can be found
+[here](https://aws.amazon.com/blogs/developer/introducing-crt-based-s3-client-and-the-s3-transfer-manager-in-the-aws-sdk-for-java-2-x/).
+
+When making multiple parallel GET requests, using the CRT ensures load is 
evenly distributed across S3. This can be

Review Comment:
   S3 front end servers, S3 back end stores -or both?



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/AWSClientConfig.java:
##########
@@ -212,6 +246,27 @@ public static RetryPolicy.Builder 
createRetryPolicyBuilder(Configuration conf) {
     return retryPolicyBuilder;
   }
 
+
+  private static S3CrtProxyConfiguration mapCRTProxyConfiguration(
+          software.amazon.awssdk.http.nio.netty.ProxyConfiguration 
proxyConfiguration) {

Review Comment:
   this the same for the unshaded SDK JARs? 



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/DefaultS3ClientFactory.java:
##########
@@ -175,6 +163,39 @@ public S3AsyncClient createS3AsyncClient(
     return s3AsyncClientBuilder.build();
   }
 
+  private S3AsyncClient createS3CrtAsyncClient(URI uri, 
S3ClientCreationParameters parameters)

Review Comment:
   Safe if these are the only places of this source file where the crt classes 
are referenced.



##########
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/crt_client.md:
##########
@@ -0,0 +1,66 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Using the S3 CRT client
+
+This document explains usage of the CRT client.
+
+The [AWS CRT-based S3 
client](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/crt-based-s3-client.html)
+is built on top of the AWS Common Runtime (CRT), is an alternative S3 
asynchronous client. It can provide higher
+transfer throughput from S3 due to its enhanced connection pool management. 
More information
+can be found
+[here](https://aws.amazon.com/blogs/developer/introducing-crt-based-s3-client-and-the-s3-transfer-manager-in-the-aws-sdk-for-java-2-x/).
+
+When making multiple parallel GET requests, using the CRT ensures load is 
evenly distributed across S3. This can be
+useful for all three input streams available with versions > 3.4.2, as:

Review Comment:
   `>=`



##########
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md:
##########
@@ -50,6 +50,7 @@ full details.
 * [Auditing Architecture](./auditing_architecture.html).
 * [Testing](./testing.html)
 * [S3Guard](./s3guard.html)
+* [Using the S3 CRT Client](./crt_client.html).

Review Comment:
   1. point to this in the reading.md file
   2. in performance.md, say that the analytics stream and crt client may also 
provide performance improvements



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java:
##########
@@ -1837,12 +1837,26 @@ private Constants() {
    */
   public static final String S3A_IO_RATE_LIMIT = "fs.s3a.io.rate.limit";
 
-
   /**
    * Prefix to configure Analytics Accelerator Library.
    * Value: {@value}.
    */
   public static final String ANALYTICS_ACCELERATOR_CONFIGURATION_PREFIX =
           "fs.s3a.analytics.accelerator";
 
+  /**
+   * Flag to enable the CRT client. Value {@value}.
+   */
+  public static final String CRT_CLIENT_ENABLED = "fs.s3a.crt.enabled";
+
+  /**
+   * Default value for {@link #DEFAULT_CRT_ENABLED}.
+   * Value: {@value}.
+   */
+  public static final boolean DEFAULT_CRT_ENABLED = false;
+
+  /**
+   * Requester pays header value. Value {@value}.
+   */
+  public static final String REQUESTER_PAYS_HEADER_VALUE = "requester";

Review Comment:
   make an InternalConstant unless there's a need to expose it



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java:
##########
@@ -1837,12 +1837,26 @@ private Constants() {
    */
   public static final String S3A_IO_RATE_LIMIT = "fs.s3a.io.rate.limit";
 
-
   /**
    * Prefix to configure Analytics Accelerator Library.
    * Value: {@value}.
    */
   public static final String ANALYTICS_ACCELERATOR_CONFIGURATION_PREFIX =
           "fs.s3a.analytics.accelerator";
 
+  /**
+   * Flag to enable the CRT client. Value {@value}.
+   */
+  public static final String CRT_CLIENT_ENABLED = "fs.s3a.crt.enabled";

Review Comment:
   can you export this in the filesystem hasPathCapability, and add to 
InternalContants.S3A_DYNAMIC_CAPABILITIES , for bucket-info and storediag. 
thanks.



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/RequestFactoryImpl.java:
##########
@@ -709,6 +742,18 @@ public void setEncryptionSecrets(final EncryptionSecrets 
secrets) {
     encryptionSecrets = secrets;
   }
 
+  private void addRequestLevelHeaders(AwsRequest.Builder requestBuilder) {
+    AwsRequestOverrideConfiguration.Builder builder = 
AwsRequestOverrideConfiguration.builder();
+
+    if (isRequesterPays) {
+      builder.putHeader(REQUESTER_PAYS_HEADER, REQUESTER_PAYS_HEADER_VALUE);
+    }
+
+    builder.putHeader(USER_AGENT, userAgent);

Review Comment:
   not for this PR, but maybe in future we could add the auditor ? strings here 
so it gets through the CRT *and* all the way to cloudtrail



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to