alamb commented on PR #23026: URL: https://github.com/apache/datafusion/pull/23026#issuecomment-4793157335
> I’ll write more about the intra-operator parallelism approach later and work on the implementation. For now, I’m trying to make the case that we should do intra-operator parallelism first, and I’m happy to help with this PR’s approach afterward. TL;DR: this order likely requires less total engineering effort. I think we need to think about how to do this carefully -- right now the execution model in DataFusion is basically single core --> single partition, and the cores share very little. If we start allowing more threads than cores (which could happen if we have operators launch tasks, for example) we may find that threading context switching, etc sucks down performance in a way we didn't expect -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
