westonpace commented on PR #13028:
URL: https://github.com/apache/arrow/pull/13028#issuecomment-1120077381

   Sure.  "Thread per core" is probably a bit of a misnomer too, but I haven't 
found a nicer term yet.  The default thread pool size is 
std::hardware_concurrency which is the maximum number of concurrent threads the 
hardware supports.  So we do not over-allocate threads.
   
   When dealing with I/O you normally want to make sure the system is doing 
useful work while the I/O is happening.  One possible solution is the 
synchronous approach where you create a pool with a lot of threads, more than 
your CPU can handle.  When I/O is encountered you simply block synchronously on 
the I/O and let the OS schedule a different thread onto the hardware.
   
   We don't do that today.  Instead we take an asynchronous approach.  To 
implement this we actually have two thread pools.  The I/O thread pool is sized 
based on how many concurrent I/O requests make sense (e.g. not very many for 
HDD and a lot for S3). It is expected these threads are usually in a waiting 
state.
   
   The second thread pool (the one that, by default, drives the execution 
engine) is the CPU thread pool.  This thread pool (again, by default) has a 
fixed size based on the processor hardware.  It's very important not to block a 
CPU thread because that usually means you are under utilizing the hardware.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to