MuteBardTison opened a new pull request, #46034:
URL: https://github.com/apache/arrow/pull/46034

   ### Rationale for this change
   
   Arrow’s current CPU thread count detection uses 
`std::thread::hardware_concurrency()` which does not take into account the 
process-level CPU affinity mask (e.g., set via `taskset`). This can lead to 
thread oversubscription and performance issues when Arrow runs in constrained 
environments.
   
   This PR updates the internal `CpuInfo` logic to use `sched_getaffinity()` on 
Linux, ensuring Arrow respects the number of cores actually available to the 
process.
   
   ### What changes are included in this PR?
   
   - Added `affinity.h` to expose `GetAffinityCpuCount()` on Linux
   - Updated `CpuInfo::Impl` in `cpu_info.cc` to use `GetAffinityCpuCount()` 
instead of raw `std::thread::hardware_concurrency()`
   - Added a new unit test in `cpu_info_test.cc` to validate this behavior 
against `sched_getaffinity()` on Linux
   - Used `#ifdef __linux__` to ensure cross-platform compatibility
   
   ### Are these changes tested?
   
   Yes ✅  
   A Linux-only unit test (`CpuInfoTest.CpuAffinity`) compares the result of 
`CpuInfo::num_cores()` with the actual CPU affinity mask from 
`sched_getaffinity()`.
   
   ### Are there any user-facing changes?
   
   No changes to public APIs.  
   Behavioral changes are limited to internal CPU thread detection logic on 
Linux.
   
   <!-- Remove the following sections if not applicable -->
   
   <!-- No breaking API changes -->
   <!-- No critical security or crash fix -->
   
   ---
   
   Original issue: https://github.com/apache/arrow/issues/45860


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to