fjetter commented on issue #38389: URL: https://github.com/apache/arrow/issues/38389#issuecomment-1775100986
> Yes, it seems we are using OMP_NUM_THREADS (and otherwise check std::thread::hardware_concurrency(), which I think also doesn't always give the correct number, eg in a container), see the relevant [code](https://github.com/apache/arrow/blob/37935604bf168a3b2d52f3cc5b0edf83b5783309/cpp/src/arrow/util/thread_pool.cc#L705-L721). You can also manually override this with pa.set_cpu_count(). Yes, thanks. I already found that pyarrow is setting the CPU threadpool to one inside of dask regardless of the env settings. I already tested a little with `set_cpu_count` but so far we haven't seen the hoped-for speedup -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
