Re: [PR] Move parallel parquet serialization to blocking threads [arrow-datafusion]

via GitHub Thu, 14 Mar 2024 11:42:28 -0700


alamb commented on PR #9605:
URL: 
https://github.com/apache/arrow-datafusion/pull/9605#issuecomment-1998097464


   > So yes in an ideal world all CPU computation would be spawned to rayon or 
a similar blocking threadpool as in this PR. However, unfortunately this isn't 
the way DF has been implemented.
   
    I  will continue to agree to disagree on this point. But I think @tustvold 
may be trolling me 🤔 :)
   
   Anyhow, for the record, I think using 2 separate tokio runtimes is perfectly 
acceptable and good for reasons I have [soapbox'ed on at length 
about](https://thenewstack.io/using-rustlangs-async-tokio-runtime-for-cpu-bound-tasks/).
 However, using tokio for CPU bound threads means it is very easy to schedule 
CPU and IO on the same thread pool which will become a bottleneck in some 
scenarios
   
   So in the sense that using a different threadpool API would make it 
impossible to mix IO and CPU work it would be an improvement. However I think 
it would be a heavy price to pay


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Move parallel parquet serialization to blocking threads [arrow-datafusion]

Reply via email to