fjetter commented on issue #38389:
URL: https://github.com/apache/arrow/issues/38389#issuecomment-1775100986

   > Yes, it seems we are using OMP_NUM_THREADS (and otherwise check 
std::thread::hardware_concurrency(), which I think also doesn't always give the 
correct number, eg in a container), see the relevant 
[code](https://github.com/apache/arrow/blob/37935604bf168a3b2d52f3cc5b0edf83b5783309/cpp/src/arrow/util/thread_pool.cc#L705-L721).
   You can also manually override this with pa.set_cpu_count().
   
   Yes, thanks. I already found that pyarrow is setting the CPU threadpool to 
one inside of dask regardless of the env settings. I already tested a little 
with `set_cpu_count` but so far we haven't seen the hoped-for speedup


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to