devinjdangelo commented on issue #9493: URL: https://github.com/apache/arrow-datafusion/issues/9493#issuecomment-1997342596
> As we were discussing this API internally with @tustvold one thing he pointed out is that the current code pretty much requires using the same tokio threadpool for compute (parquet encoding) and I/O (the object store multi-part write). This can cause various problems, depending on what your system is doing. I think the parallel writer facilitates a fairly straightforward way to move the most CPU heavy work (serialization) to a separate thread. Here is a draft PR #9605. I think the main objection to using `spawn_blocking` here is that tokio doesn't actually manage a blocking thread-pool, but just keeps spawning new threads. Rayon or a second tokio runtime would likely perform better in a case where we needed 100s of parallel tasks (e.g. 4 open row groups on a parquet file with 64 columns on a system with only 8 CPU cores). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
