alamb commented on issue #9493: URL: https://github.com/apache/arrow-datafusion/issues/9493#issuecomment-1997126661
As we were discussing this API internally with @tustvold one thing he pointed out is that the current code pretty much requires using the same tokio threadpool for compute (parquet encoding) and I/O (the object store multi-part write). This can cause various problems, depending on what your system is doing. Some discussion on CPU bound work in tokio: https://thenewstack.io/using-rustlangs-async-tokio-runtime-for-cpu-bound-tasks/ Thus, one thing that would be nice to think about in this API is how we can support doing the IO (e.g. `put_mulitpart` on a different threadpool (aka tokio Runtime) I believe @tustvold has also been thinking about this in the context of https://github.com/apache/arrow-rs/issues/5458 and may even be planning on porting some/all of the parallelized parquet writer upstream to parquet (I don't fully know the plan yet) Therefore, as we go through this exercise, we may want to help / join forces upstream / take those plans into account as we figure out the right API to extract -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
