neilconway commented on PR #21240: URL: https://github.com/apache/datafusion/pull/21240#issuecomment-4177985136
> But I think this is not what we want ideally - we want to run few independent pipelines as possible, and get (data) parallelism from the individual pipelines rather than executing all at the same time. I don't disagree 😊 But for the purposes of this PR, we will regress performance on some benchmark queries if we don't do some additional work to get the same degree of overlapping that the cross-join path gets today. Is that something we're okay with? I don't think the additional complexity to overlap subquery evaluation with main query evaluation is too bad (via `WaitForSubqueryExec`), but if we're going to land morsel-driven parallelism soon-ish (🎉🎉🎉), maybe that will solve this problem in a cleaner / more general way and we can keep the subquery eval stuff simpler. Let me know what you think @Dandandan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
