azhu248 opened a new pull request, #49810: URL: https://github.com/apache/arrow/pull/49810
This maps the Google Cloud Client ConnectionPoolSizeOption directly to Arrow's IO thread pool capacity via the io_context, increasing parallel read throughput for cloud blob systems. It also includes a test covering the fallback Thread Pool capacity mapping. Closes #20314 ### Rationale for this change Multithreaded read performance can be artificially bottlenecked by Google Cloud Client Library's default ConnectionPoolSize. Instead of exposing an entirely new option solely for this, we link it intrinsically to the Arrow I/O Thread Pool capacity. ### What changes are included in this PR? - Extended the initialization path to pass `io_context` down to `internal::AsGoogleCloudOptions()`. - Dynamically assigned `gcs::ConnectionPoolSizeOption` from `io_context.executor()->GetCapacity()` or fell back safely to `::arrow::io::GetIOThreadPoolCapacity()`. - Guaranteed a minimum connection pool size of `4` utilizing `std::max`. This prevents accidentally penalizing single-threaded users (e.g. users with capacity set to `1`) ### Are these changes tested? Yes. I added the unit test `OptionsConnectionPoolSizeFallback` to `gcsfs_test.cc` that validates: - The fallback logic defaults correctly to the system's global IO thread pool. - Modifying the thread pool via `arrow::io::SetIOThreadPoolCapacity(...)` updates the corresponding generated Google Cloud Option dynamically and perfectly. ### Are there any user-facing changes? No breaking APIs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
