gaborgsomogyi commented on code in PR #28136: URL: https://github.com/apache/flink/pull/28136#discussion_r3272532603
########## flink-filesystems/flink-s3-fs-native/README.md: ########## @@ -327,6 +327,61 @@ When enabled, file uploads automatically use TransferManager for: - Better utilization of available bandwidth - Lower heap requirements for write operations +## AWS Common Runtime (CRT) Support + +The filesystem optionally supports the [AWS Common Runtime (CRT)](https://github.com/awslabs/aws-crt-java) HTTP transport +for higher throughput on large S3 workloads. + +When enabled, the CRT transport replaces: +- **Sync client**: Apache HTTP Client → `AwsCrtHttpClient` +- **Async client**: Netty NIO → `S3AsyncClient.crtBuilder()` (with built-in multipart acceleration) + +### Prerequisites + +The `aws-crt` artifact contains JNI-linked native libraries whose C-side `FindClass` paths are +hardcoded, making Maven shade relocation incompatible. Therefore **CRT JARs are not bundled** in +the fat JAR and must be placed manually. + +### Setup (step-by-step) + +1. Download `aws-crt-client-<version>.jar` (`groupId: software.amazon.awssdk`) from Maven Central, + matching the AWS SDK version used by this module (check `fs.s3.aws.sdk.version` in `pom.xml` — + e.g. `2.44.4`). + +2. Resolve the matching `aws-crt-<version>.jar` (`groupId: software.amazon.awssdk.crt`). **Note:** + `aws-crt` uses an independent versioning scheme (e.g. `0.33.x`) that does **not** track the AWS + SDK version. The compatible `aws-crt` version is declared as a dependency in + `aws-crt-client-<version>.pom` — open that POM on Maven Central and read the + `<dependency>` entry for `software.amazon.awssdk.crt:aws-crt` to find the exact version to + download. + +3. Place both JARs in the Flink plugin directory alongside `flink-s3-fs-native.jar`: + + ```bash + cp aws-crt-client-<version>.jar $FLINK_HOME/plugins/s3-fs-native/ + cp aws-crt-<version>.jar $FLINK_HOME/plugins/s3-fs-native/ Review Comment: The version guidance is now correct, but the manual two-step download is still error-prone. Consider adding a helper script at `flink-filesystems/flink-s3-fs-native/tools/download-crt-jars.sh` that resolves and downloads the exact compatible JARs automatically: ```bash #!/usr/bin/env bash # Downloads aws-crt-client and aws-crt JARs (excluded from the fat JAR due to JNI shading) # to ./crt-jars/. Copy the output to $FLINK_HOME/plugins/s3-fs-native/ alongside # flink-s3-fs-native.jar to enable CRT transport (s3.crt.enabled=true). # # Usage: download-crt-jars.sh [AWS_CRT_VERSION] # AWS_CRT_VERSION aws-crt version to download (default: pinned value below). # Update the default when fs.s3.aws.sdk.version is bumped; # the compatible version is in aws-sdk-java-pom-<sdk>.pom -> awscrt.version. set -euo pipefail SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" SDK_VERSION=$(grep -oP '(?<=<fs.s3.aws.sdk.version>)[^<]+' "${SCRIPT_DIR}/../pom.xml") AWS_CRT_VERSION=${1:-0.45.1} # pinned for sdk ${SDK_VERSION} OUT_DIR="${SCRIPT_DIR}/../crt-jars" mkdir -p "${OUT_DIR}" mvn dependency:copy \ -Dartifact="software.amazon.awssdk:aws-crt-client:${SDK_VERSION}:jar" \ -DoutputDirectory="${OUT_DIR}" -Dmdep.useBaseVersion=true mvn dependency:copy \ -Dartifact="software.amazon.awssdk.crt:aws-crt:${AWS_CRT_VERSION}:jar" \ -DoutputDirectory="${OUT_DIR}" -Dmdep.useBaseVersion=true echo "Done. Copy ${OUT_DIR}/*.jar to \$FLINK_HOME/plugins/s3-fs-native/." ``` The script reads `fs.s3.aws.sdk.version` directly from the module POM so it stays in sync automatically. `AWS_CRT_VERSION` can be overridden as a positional argument when needed; otherwise the pinned default is used. The README setup section should then reference this script instead of the manual download steps. ########## flink-filesystems/flink-s3-fs-native/src/main/java/org/apache/flink/fs/s3native/S3ClientProvider.java: ########## @@ -326,6 +359,12 @@ public static class Builder { // Custom credentials provider class names (comma-separated) @Nullable private String credentialsProviderClasses; + // CRT configuration + private boolean useCrt = false; + @Nullable private Double crtTargetThroughputGbps = null; + private long crtMinPartSizeInBytes = + NativeS3FileSystemFactory.PART_UPLOAD_MIN_SIZE.defaultValue(); Review Comment: `crtMinPartSizeInBytes` correctly derives its default from `NativeS3FileSystemFactory.PART_UPLOAD_MIN_SIZE.defaultValue()`, but the pattern is applied inconsistently. The Builder also hardcodes defaults for `chunkedEncoding`, `checksumValidation`, `maxConnections`, `connectionTimeout`, `socketTimeout`, `connectionMaxIdleTime`, `maxRetries`, `clientCloseTimeout`, and others — all of which have a canonical `ConfigOption` definition in `NativeS3FileSystemFactory`. If any of those config defaults change, the Builder silently drifts. All Builder field defaults should derive from their corresponding `ConfigOption.defaultValue()`, not be re-declared as magic literals. For example: ```java private boolean chunkedEncoding = NativeS3FileSystemFactory.CHUNKED_ENCODING_ENABLED.defaultValue(); private boolean checksumValidation = NativeS3FileSystemFactory.CHECKSUM_VALIDATION_ENABLED.defaultValue(); private int maxConnections = NativeS3FileSystemFactory.MAX_CONNECTIONS.defaultValue(); // etc. ``` This makes `NativeS3FileSystemFactory` the single source of truth for all defaults. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
