[ https://issues.apache.org/jira/browse/HADOOP-19399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17948485#comment-17948485 ]
ASF GitHub Bot commented on HADOOP-19399: ----------------------------------------- steveloughran commented on code in PR #7443: URL: https://github.com/apache/hadoop/pull/7443#discussion_r2068647513 ########## hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/crt_client.md: ########## @@ -0,0 +1,66 @@ +<!--- + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. See accompanying LICENSE file. +--> + +# Using the S3 CRT client + +This document explains usage of the CRT client. + +The [AWS CRT-based S3 client](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/crt-based-s3-client.html) +is built on top of the AWS Common Runtime (CRT), is an alternative S3 asynchronous client. It can provide higher +transfer throughput from S3 due to its enhanced connection pool management. More information +can be found +[here](https://aws.amazon.com/blogs/developer/introducing-crt-based-s3-client-and-the-s3-transfer-manager-in-the-aws-sdk-for-java-2-x/). + +When making multiple parallel GET requests, using the CRT ensures load is evenly distributed across S3. This can be +useful for all three input streams available with versions > 3.4.2, as: +* The Analytics accelerator will make parallel GETs for columns it predicts will be required on a Parquet file open. +* The Prefetch input stream can make up to 8 parallel 8MB GETs by default. +* The Classic input stream will make async parallel GETs for column reads when using VectoredIO. + +Since the CRT client is an async client, when enabled in S3A, it will currently be used wherever +* When reading data using the Analytics stream. +* When copying files between buckets using the Transfer manager, for example, on a rename. +* When copying from the local file system to S3. + +This is because all other operations in S3A currently use the S3 Sync client, and so cannot be replaced by an +asynchronous client. The move to an async client is tracked in +[HADOOP-18877](https://issues.apache.org/jira/browse/HADOOP-18877). + +## Enabling the CRT Client +The CRT client can be enabled as follows: + +``` Review Comment: nit: xml ########## hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/AWSClientConfig.java: ########## @@ -371,14 +426,23 @@ private static URI buildURI(String scheme, String host, int port) { * @param clientConfig AWS SDK configuration to update */ private static void initUserAgent(Configuration conf, - ClientOverrideConfiguration.Builder clientConfig) { + ClientOverrideConfiguration.Builder clientConfig) { + clientConfig.putAdvancedOption(SdkAdvancedClientOption.USER_AGENT_PREFIX, getUserAgent(conf)); + } + + /** + * Builds user agent string Review Comment: nit: . ########## hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/crt_client.md: ########## @@ -0,0 +1,66 @@ +<!--- + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. See accompanying LICENSE file. +--> + +# Using the S3 CRT client + +This document explains usage of the CRT client. + +The [AWS CRT-based S3 client](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/crt-based-s3-client.html) +is built on top of the AWS Common Runtime (CRT), is an alternative S3 asynchronous client. It can provide higher +transfer throughput from S3 due to its enhanced connection pool management. More information +can be found +[here](https://aws.amazon.com/blogs/developer/introducing-crt-based-s3-client-and-the-s3-transfer-manager-in-the-aws-sdk-for-java-2-x/). + +When making multiple parallel GET requests, using the CRT ensures load is evenly distributed across S3. This can be Review Comment: S3 front end servers, S3 back end stores -or both? ########## hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/AWSClientConfig.java: ########## @@ -212,6 +246,27 @@ public static RetryPolicy.Builder createRetryPolicyBuilder(Configuration conf) { return retryPolicyBuilder; } + + private static S3CrtProxyConfiguration mapCRTProxyConfiguration( + software.amazon.awssdk.http.nio.netty.ProxyConfiguration proxyConfiguration) { Review Comment: this the same for the unshaded SDK JARs? ########## hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/DefaultS3ClientFactory.java: ########## @@ -175,6 +163,39 @@ public S3AsyncClient createS3AsyncClient( return s3AsyncClientBuilder.build(); } + private S3AsyncClient createS3CrtAsyncClient(URI uri, S3ClientCreationParameters parameters) Review Comment: Safe if these are the only places of this source file where the crt classes are referenced. ########## hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/crt_client.md: ########## @@ -0,0 +1,66 @@ +<!--- + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. See accompanying LICENSE file. +--> + +# Using the S3 CRT client + +This document explains usage of the CRT client. + +The [AWS CRT-based S3 client](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/crt-based-s3-client.html) +is built on top of the AWS Common Runtime (CRT), is an alternative S3 asynchronous client. It can provide higher +transfer throughput from S3 due to its enhanced connection pool management. More information +can be found +[here](https://aws.amazon.com/blogs/developer/introducing-crt-based-s3-client-and-the-s3-transfer-manager-in-the-aws-sdk-for-java-2-x/). + +When making multiple parallel GET requests, using the CRT ensures load is evenly distributed across S3. This can be +useful for all three input streams available with versions > 3.4.2, as: Review Comment: `>=` ########## hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md: ########## @@ -50,6 +50,7 @@ full details. * [Auditing Architecture](./auditing_architecture.html). * [Testing](./testing.html) * [S3Guard](./s3guard.html) +* [Using the S3 CRT Client](./crt_client.html). Review Comment: 1. point to this in the reading.md file 2. in performance.md, say that the analytics stream and crt client may also provide performance improvements ########## hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java: ########## @@ -1837,12 +1837,26 @@ private Constants() { */ public static final String S3A_IO_RATE_LIMIT = "fs.s3a.io.rate.limit"; - /** * Prefix to configure Analytics Accelerator Library. * Value: {@value}. */ public static final String ANALYTICS_ACCELERATOR_CONFIGURATION_PREFIX = "fs.s3a.analytics.accelerator"; + /** + * Flag to enable the CRT client. Value {@value}. + */ + public static final String CRT_CLIENT_ENABLED = "fs.s3a.crt.enabled"; + + /** + * Default value for {@link #DEFAULT_CRT_ENABLED}. + * Value: {@value}. + */ + public static final boolean DEFAULT_CRT_ENABLED = false; + + /** + * Requester pays header value. Value {@value}. + */ + public static final String REQUESTER_PAYS_HEADER_VALUE = "requester"; Review Comment: make an InternalConstant unless there's a need to expose it ########## hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java: ########## @@ -1837,12 +1837,26 @@ private Constants() { */ public static final String S3A_IO_RATE_LIMIT = "fs.s3a.io.rate.limit"; - /** * Prefix to configure Analytics Accelerator Library. * Value: {@value}. */ public static final String ANALYTICS_ACCELERATOR_CONFIGURATION_PREFIX = "fs.s3a.analytics.accelerator"; + /** + * Flag to enable the CRT client. Value {@value}. + */ + public static final String CRT_CLIENT_ENABLED = "fs.s3a.crt.enabled"; Review Comment: can you export this in the filesystem hasPathCapability, and add to InternalContants.S3A_DYNAMIC_CAPABILITIES , for bucket-info and storediag. thanks. ########## hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/RequestFactoryImpl.java: ########## @@ -709,6 +742,18 @@ public void setEncryptionSecrets(final EncryptionSecrets secrets) { encryptionSecrets = secrets; } + private void addRequestLevelHeaders(AwsRequest.Builder requestBuilder) { + AwsRequestOverrideConfiguration.Builder builder = AwsRequestOverrideConfiguration.builder(); + + if (isRequesterPays) { + builder.putHeader(REQUESTER_PAYS_HEADER, REQUESTER_PAYS_HEADER_VALUE); + } + + builder.putHeader(USER_AGENT, userAgent); Review Comment: not for this PR, but maybe in future we could add the auditor ? strings here so it gets through the CRT *and* all the way to cloudtrail > S3A: Add support for CRT client > ------------------------------- > > Key: HADOOP-19399 > URL: https://issues.apache.org/jira/browse/HADOOP-19399 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Affects Versions: 3.4.1 > Reporter: Ahmar Suhail > Priority: Major > Labels: pull-request-available > Fix For: 3.4.2 > > > * Allow ClientManager to initialise a CRT client > * Should be optional, java async client to still be supported as defaultĀ -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org