[ 
https://issues.apache.org/jira/browse/HADOOP-19399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17948485#comment-17948485
 ] 

ASF GitHub Bot commented on HADOOP-19399:
-----------------------------------------

steveloughran commented on code in PR #7443:
URL: https://github.com/apache/hadoop/pull/7443#discussion_r2068647513


##########
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/crt_client.md:
##########
@@ -0,0 +1,66 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Using the S3 CRT client
+
+This document explains usage of the CRT client.
+
+The [AWS CRT-based S3 
client](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/crt-based-s3-client.html)
+is built on top of the AWS Common Runtime (CRT), is an alternative S3 
asynchronous client. It can provide higher
+transfer throughput from S3 due to its enhanced connection pool management. 
More information
+can be found
+[here](https://aws.amazon.com/blogs/developer/introducing-crt-based-s3-client-and-the-s3-transfer-manager-in-the-aws-sdk-for-java-2-x/).
+
+When making multiple parallel GET requests, using the CRT ensures load is 
evenly distributed across S3. This can be
+useful for all three input streams available with versions > 3.4.2, as:
+* The Analytics accelerator will make parallel GETs for columns it predicts 
will be required on a Parquet file open.
+* The Prefetch input stream can make up to 8 parallel 8MB GETs by default.
+* The Classic input stream will make async parallel GETs for column reads when 
using VectoredIO.
+
+Since the CRT client is an async client, when enabled in S3A, it will 
currently be used wherever
+* When reading data using the Analytics stream.
+* When copying files between buckets using the Transfer manager, for example, 
on a rename.
+* When copying from the local file system to S3.
+
+This is because all other operations in S3A currently use the S3 Sync client, 
and so cannot be replaced by an
+asynchronous client. The move to an async client is tracked in
+[HADOOP-18877](https://issues.apache.org/jira/browse/HADOOP-18877).
+
+## Enabling the CRT Client
+The CRT client can be enabled as follows:
+
+```

Review Comment:
   nit: xml



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/AWSClientConfig.java:
##########
@@ -371,14 +426,23 @@ private static URI buildURI(String scheme, String host, 
int port) {
    * @param clientConfig AWS SDK configuration to update
    */
   private static void initUserAgent(Configuration conf,
-      ClientOverrideConfiguration.Builder clientConfig) {
+     ClientOverrideConfiguration.Builder clientConfig) {
+     clientConfig.putAdvancedOption(SdkAdvancedClientOption.USER_AGENT_PREFIX, 
getUserAgent(conf));
+  }
+
+  /**
+   * Builds user agent string

Review Comment:
   nit: .



##########
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/crt_client.md:
##########
@@ -0,0 +1,66 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Using the S3 CRT client
+
+This document explains usage of the CRT client.
+
+The [AWS CRT-based S3 
client](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/crt-based-s3-client.html)
+is built on top of the AWS Common Runtime (CRT), is an alternative S3 
asynchronous client. It can provide higher
+transfer throughput from S3 due to its enhanced connection pool management. 
More information
+can be found
+[here](https://aws.amazon.com/blogs/developer/introducing-crt-based-s3-client-and-the-s3-transfer-manager-in-the-aws-sdk-for-java-2-x/).
+
+When making multiple parallel GET requests, using the CRT ensures load is 
evenly distributed across S3. This can be

Review Comment:
   S3 front end servers, S3 back end stores -or both?



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/AWSClientConfig.java:
##########
@@ -212,6 +246,27 @@ public static RetryPolicy.Builder 
createRetryPolicyBuilder(Configuration conf) {
     return retryPolicyBuilder;
   }
 
+
+  private static S3CrtProxyConfiguration mapCRTProxyConfiguration(
+          software.amazon.awssdk.http.nio.netty.ProxyConfiguration 
proxyConfiguration) {

Review Comment:
   this the same for the unshaded SDK JARs? 



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/DefaultS3ClientFactory.java:
##########
@@ -175,6 +163,39 @@ public S3AsyncClient createS3AsyncClient(
     return s3AsyncClientBuilder.build();
   }
 
+  private S3AsyncClient createS3CrtAsyncClient(URI uri, 
S3ClientCreationParameters parameters)

Review Comment:
   Safe if these are the only places of this source file where the crt classes 
are referenced.



##########
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/crt_client.md:
##########
@@ -0,0 +1,66 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Using the S3 CRT client
+
+This document explains usage of the CRT client.
+
+The [AWS CRT-based S3 
client](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/crt-based-s3-client.html)
+is built on top of the AWS Common Runtime (CRT), is an alternative S3 
asynchronous client. It can provide higher
+transfer throughput from S3 due to its enhanced connection pool management. 
More information
+can be found
+[here](https://aws.amazon.com/blogs/developer/introducing-crt-based-s3-client-and-the-s3-transfer-manager-in-the-aws-sdk-for-java-2-x/).
+
+When making multiple parallel GET requests, using the CRT ensures load is 
evenly distributed across S3. This can be
+useful for all three input streams available with versions > 3.4.2, as:

Review Comment:
   `>=`



##########
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md:
##########
@@ -50,6 +50,7 @@ full details.
 * [Auditing Architecture](./auditing_architecture.html).
 * [Testing](./testing.html)
 * [S3Guard](./s3guard.html)
+* [Using the S3 CRT Client](./crt_client.html).

Review Comment:
   1. point to this in the reading.md file
   2. in performance.md, say that the analytics stream and crt client may also 
provide performance improvements



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java:
##########
@@ -1837,12 +1837,26 @@ private Constants() {
    */
   public static final String S3A_IO_RATE_LIMIT = "fs.s3a.io.rate.limit";
 
-
   /**
    * Prefix to configure Analytics Accelerator Library.
    * Value: {@value}.
    */
   public static final String ANALYTICS_ACCELERATOR_CONFIGURATION_PREFIX =
           "fs.s3a.analytics.accelerator";
 
+  /**
+   * Flag to enable the CRT client. Value {@value}.
+   */
+  public static final String CRT_CLIENT_ENABLED = "fs.s3a.crt.enabled";
+
+  /**
+   * Default value for {@link #DEFAULT_CRT_ENABLED}.
+   * Value: {@value}.
+   */
+  public static final boolean DEFAULT_CRT_ENABLED = false;
+
+  /**
+   * Requester pays header value. Value {@value}.
+   */
+  public static final String REQUESTER_PAYS_HEADER_VALUE = "requester";

Review Comment:
   make an InternalConstant unless there's a need to expose it



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java:
##########
@@ -1837,12 +1837,26 @@ private Constants() {
    */
   public static final String S3A_IO_RATE_LIMIT = "fs.s3a.io.rate.limit";
 
-
   /**
    * Prefix to configure Analytics Accelerator Library.
    * Value: {@value}.
    */
   public static final String ANALYTICS_ACCELERATOR_CONFIGURATION_PREFIX =
           "fs.s3a.analytics.accelerator";
 
+  /**
+   * Flag to enable the CRT client. Value {@value}.
+   */
+  public static final String CRT_CLIENT_ENABLED = "fs.s3a.crt.enabled";

Review Comment:
   can you export this in the filesystem hasPathCapability, and add to 
InternalContants.S3A_DYNAMIC_CAPABILITIES , for bucket-info and storediag. 
thanks.



##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/RequestFactoryImpl.java:
##########
@@ -709,6 +742,18 @@ public void setEncryptionSecrets(final EncryptionSecrets 
secrets) {
     encryptionSecrets = secrets;
   }
 
+  private void addRequestLevelHeaders(AwsRequest.Builder requestBuilder) {
+    AwsRequestOverrideConfiguration.Builder builder = 
AwsRequestOverrideConfiguration.builder();
+
+    if (isRequesterPays) {
+      builder.putHeader(REQUESTER_PAYS_HEADER, REQUESTER_PAYS_HEADER_VALUE);
+    }
+
+    builder.putHeader(USER_AGENT, userAgent);

Review Comment:
   not for this PR, but maybe in future we could add the auditor ? strings here 
so it gets through the CRT *and* all the way to cloudtrail





> S3A: Add support for CRT client
> -------------------------------
>
>                 Key: HADOOP-19399
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19399
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.4.1
>            Reporter: Ahmar Suhail
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.4.2
>
>
> * Allow ClientManager to initialise a CRT client
>  * Should be optional, java async client to still be supported as defaultĀ 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to