Samrat002 opened a new pull request, #28136:
URL: https://github.com/apache/flink/pull/28136

   ## What is the purpose of the change
   This pull request adds optional AWS Common Runtime (CRT) HTTP transport 
support to `flink-s3-fs-native`. When enabled via `s3.crt.enabled: true`, the 
module switches from Apache HTTP Client (sync) + Netty NIO (async) to 
`AwsCrtHttpClient` for sync operations and `S3AsyncClient.crtBuilder()` for the 
async client backing `S3TransferManager`. The CRT-based S3 client has built-in 
multipart transfer acceleration and higher throughput via native I/O, which is 
beneficial for large-scale S3 workloads.                       
   
   CRT JARs (`aws-crt-client`, `aws-crt`) are intentionally **not bundled** in 
the shaded fat JAR: the `aws-crt` artifact contains JNI-linked native libraries 
whose C-side `FindClass` paths are hardcoded, making Maven shade relocation 
incompatible. Users who opt in must place these JARs in the Flink plugin 
directory alongside the fat JAR.
   
   ## Brief change log
     - Added `software.amazon.awssdk:aws-crt-client` as a `provided`-scope 
dependency (compile-only, excluded from shading)
     - Added two new config options to `NativeS3FileSystemFactory`:
       - `s3.crt.enabled` (boolean, default `false`) — switches both sync and 
async HTTP transport to CRT                                                     
                      
       - `s3.crt.target-throughput-gbps` (double, default `10.0`) — tunes the 
CRT async client's target throughput                                            
                      
     - Extended `S3ClientProvider.Builder` with `useCrt`, 
`crtTargetThroughputGbps`, `crtMinPartSizeInBytes` fields; `build()` branches 
on `useCrt` to construct either CRT or the  
     existing Apache + Netty clients                                            
                                                                                
                    
     - CRT async client is configured with `forcePathStyle`, 
`checksumValidationEnabled`, `S3CrtRetryConfiguration`, and `maxConcurrency` 
drawn from existing connection config;    
     `minimumPartSizeInBytes` maps from the existing `s3.upload.min.part.size` 
setting                                                                         
                     
     - Updated `CLAUDE.md` with CRT setup instructions and the shading 
constraint rationale
                                     
     ## Verifying this change                                                   
                                                                                
                    
                                                               
     This change added tests and can be verified as follows:                    
                                                                                
                       
     - Added `testCrtDisabledByDefault()` in `S3ClientProviderTest` — asserts 
that `isUseCrt()` is `false` when no CRT config is provided                     
                      
     - Added `testCrtFlagIsRecorded()` in `S3ClientProviderTest` — asserts that 
`isUseCrt()` is `true` and `getCrtTargetThroughputGbps()` reflects the 
configured value when
     `useCrt(true)` is set on the builder                                       
                                                                                
                    
     - All existing module tests continue to pass (`mvn verify` clean)
     - Functional end-to-end verification requires placing `aws-crt-client` and 
`aws-crt` JARs in the plugin directory and pointing a Flink job at an S3 (or 
MinIO) endpoint with   
     `s3.crt.enabled: true`                                                     
                                                                                
                    
                                                                                
                                                                                
                                                                                
                                                                                
            
     ## Does this pull request potentially affect one of the following parts:   
                                                                                
                                                                                
            
     - Dependencies (does it add or upgrade a dependency): **yes** — adds 
`software.amazon.awssdk:aws-crt-client` as a `provided`-scope (compile-only) 
dependency; not bundled in   
     the fat JAR
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: **no**                                                     
                                   
     - The serializers: **no**                                                  
                                                                                
                    
     - The runtime per-record code paths (performance sensitive): **no**        
                                                                                
                    
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: **no**                  
                           
     - The S3 file system connector: **yes** — `flink-s3-fs-native` only; 
`flink-s3-fs-hadoop` and `flink-s3-fs-presto` are unaffected                    
                          
                                                                                
                                                                                
                    
                                                                                
                                                                                
                    
     ## Documentation                                                           
                                                                                
                    
                                                               
     - Does this pull request introduce a new feature? **yes**                  
                                                                                
                    
     - If yes, how is the feature documented? **JavaDocs** (config option 
descriptions in `NativeS3FileSystemFactory`)          
                                                               
     ---                                                                        
                                                                                
                    
                                                               
     ##### Was generative AI tooling used to co-author this PR?
     - [] Yes                                                                   
                                                       
      


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to