andygrove commented on code in PR #3802:
URL: https://github.com/apache/datafusion-comet/pull/3802#discussion_r3001298235
##########
native/core/src/parquet/parquet_support.rs:
##########
@@ -444,6 +447,32 @@ fn create_hdfs_object_store(
})
}
+type ObjectStoreCache = RwLock<HashMap<(String, u64), Arc<dyn ObjectStore>>>;
+
+/// Global cache of object stores keyed by (url_key, config_hash).
+///
+/// This avoids creating a new S3 client (and thus a new HTTP connection pool
and DNS
+/// resolution) for every Parquet file read. Without this cache, each call to
+/// `initRecordBatchReader` from the JVM creates a fresh `reqwest::Client`,
leading to
+/// excessive DNS queries that can overwhelm DNS resolvers (e.g., Route53
Resolver limits
+/// in EKS environments).
+fn object_store_cache() -> &'static ObjectStoreCache {
+ static CACHE: OnceLock<ObjectStoreCache> = OnceLock::new();
Review Comment:
Added docs, but I have not yet confirmed that the part about credential
provider interaction is actually correct, so moved to draft for now.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]