Izeren commented on PR #27788:
URL: https://github.com/apache/flink/pull/27788#issuecomment-4133740922
I ran Claude on it, and it has some good points. I will have a deeper look
later today or tomorrow
```
PR: [FLINK-39166] Add bucket-level configuration support to Native S3
FileSystem
Author: Samrat002
Verdict: REQUEST CHANGES
Critical (2)
1. Per-bucket clients don't inherit global config — When a bucket-specific
config is defined, settings like maxRetries, maxConnections,
credentialsProviderClasses, and assumeRoleSessionDuration silently fall back
to hardcoded defaults instead of inheriting the globally configured
values. For example, a global s3.retry.max-num-retries: 10 would be ignored —
per-bucket clients retry only 3 times.
2. S3ClientProviderCache is architecturally unnecessary — Flink's
FileSystem.getUnguardedFileSystem() already caches filesystem instances by
FSKey(scheme, authority), where authority = bucket name. Each
NativeS3FileSystem instance serves exactly one bucket, so the cache with
its HashMap, lock, and async close logic will never hold more than one entry.
This adds complexity and synchronization overhead for no
benefit.
Major (5)
3. Mixed concerns in one commit — Bundles unrelated
CONNECTION_TIMEOUT/SOCKET_TIMEOUT config options and bulkCopy constructor
refactoring with bucket-level config. Should be separate commits.
4. Missing documentation — The primary feature
(s3.bucket.<name>.<property> config format) is completely undocumented in the
README. Only 2 lines were added for timeout config.
5. No tests for S3ClientProviderCache — Has non-trivial concurrent logic
(synchronized access, closed-state check, async close with timeout) but zero
test coverage.
6. Three telescoping constructors with 10-15 params each — Fragile API;
should use a builder pattern or a single constructor since only the factory
instantiates it.
7. No validation for partial credentials — Setting access-key without
secret-key (or vice versa) silently creates a broken config that will fail at
runtime with a cryptic AWS SDK error instead of a clear
config-time error.
Minor (5)
8. BucketConfigProvider should be package-private (only used within its
package)
9. S3BucketConfig.Builder constructor is public but outer class is
package-private
10. Duplicate null-handling for bucketConfig in both caller and callee
11. closed field in cache missing @GuardedBy("cacheLock") annotation
12. Lambda (config) -> createClientProviderForBucketConfig(config) can be
a method reference
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]