Samrat002 opened a new pull request, #28136:
URL: https://github.com/apache/flink/pull/28136
## What is the purpose of the change
This pull request adds optional AWS Common Runtime (CRT) HTTP transport
support to `flink-s3-fs-native`. When enabled via `s3.crt.enabled: true`, the
module switches from Apache HTTP Client (sync) + Netty NIO (async) to
`AwsCrtHttpClient` for sync operations and `S3AsyncClient.crtBuilder()` for the
async client backing `S3TransferManager`. The CRT-based S3 client has built-in
multipart transfer acceleration and higher throughput via native I/O, which is
beneficial for large-scale S3 workloads.
CRT JARs (`aws-crt-client`, `aws-crt`) are intentionally **not bundled** in
the shaded fat JAR: the `aws-crt` artifact contains JNI-linked native libraries
whose C-side `FindClass` paths are hardcoded, making Maven shade relocation
incompatible. Users who opt in must place these JARs in the Flink plugin
directory alongside the fat JAR.
## Brief change log
- Added `software.amazon.awssdk:aws-crt-client` as a `provided`-scope
dependency (compile-only, excluded from shading)
- Added two new config options to `NativeS3FileSystemFactory`:
- `s3.crt.enabled` (boolean, default `false`) — switches both sync and
async HTTP transport to CRT
- `s3.crt.target-throughput-gbps` (double, default `10.0`) — tunes the
CRT async client's target throughput
- Extended `S3ClientProvider.Builder` with `useCrt`,
`crtTargetThroughputGbps`, `crtMinPartSizeInBytes` fields; `build()` branches
on `useCrt` to construct either CRT or the
existing Apache + Netty clients
- CRT async client is configured with `forcePathStyle`,
`checksumValidationEnabled`, `S3CrtRetryConfiguration`, and `maxConcurrency`
drawn from existing connection config;
`minimumPartSizeInBytes` maps from the existing `s3.upload.min.part.size`
setting
- Updated `CLAUDE.md` with CRT setup instructions and the shading
constraint rationale
## Verifying this change
This change added tests and can be verified as follows:
- Added `testCrtDisabledByDefault()` in `S3ClientProviderTest` — asserts
that `isUseCrt()` is `false` when no CRT config is provided
- Added `testCrtFlagIsRecorded()` in `S3ClientProviderTest` — asserts that
`isUseCrt()` is `true` and `getCrtTargetThroughputGbps()` reflects the
configured value when
`useCrt(true)` is set on the builder
- All existing module tests continue to pass (`mvn verify` clean)
- Functional end-to-end verification requires placing `aws-crt-client` and
`aws-crt` JARs in the plugin directory and pointing a Flink job at an S3 (or
MinIO) endpoint with
`s3.crt.enabled: true`
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): **yes** — adds
`software.amazon.awssdk:aws-crt-client` as a `provided`-scope (compile-only)
dependency; not bundled in
the fat JAR
- The public API, i.e., is any changed class annotated with
`@Public(Evolving)`: **no**
- The serializers: **no**
- The runtime per-record code paths (performance sensitive): **no**
- Anything that affects deployment or recovery: JobManager (and its
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: **no**
- The S3 file system connector: **yes** — `flink-s3-fs-native` only;
`flink-s3-fs-hadoop` and `flink-s3-fs-presto` are unaffected
## Documentation
- Does this pull request introduce a new feature? **yes**
- If yes, how is the feature documented? **JavaDocs** (config option
descriptions in `NativeS3FileSystemFactory`)
---
##### Was generative AI tooling used to co-author this PR?
- [] Yes
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]