azhu248 opened a new pull request, #49810:
URL: https://github.com/apache/arrow/pull/49810

   This maps the Google Cloud Client ConnectionPoolSizeOption directly to 
Arrow's IO thread pool capacity via the io_context, increasing parallel read 
throughput for cloud blob systems. It also includes a test covering the 
fallback Thread Pool capacity mapping.
   
   Closes #20314
   
   ### Rationale for this change
   
   Multithreaded read performance can be artificially bottlenecked by Google 
Cloud Client Library's default ConnectionPoolSize. Instead of exposing an 
entirely new option solely for this, we link it intrinsically to the Arrow I/O 
Thread Pool capacity. 
   
   ### What changes are included in this PR?
   
   - Extended the initialization path to pass `io_context` down to 
`internal::AsGoogleCloudOptions()`.
   - Dynamically assigned `gcs::ConnectionPoolSizeOption` from 
`io_context.executor()->GetCapacity()` or fell back safely to 
`::arrow::io::GetIOThreadPoolCapacity()`.
   - Guaranteed a minimum connection pool size of `4` utilizing `std::max`. 
This prevents accidentally penalizing single-threaded users (e.g. users with 
capacity set to `1`)
   
   ### Are these changes tested?
   
   Yes. I added the unit test `OptionsConnectionPoolSizeFallback` to 
`gcsfs_test.cc` that validates:
   - The fallback logic defaults correctly to the system's global IO thread 
pool.
   - Modifying the thread pool via `arrow::io::SetIOThreadPoolCapacity(...)` 
updates the corresponding generated Google Cloud Option dynamically and 
perfectly.
   
   ### Are there any user-facing changes?
   
   No breaking APIs.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to