asolimando commented on PR #21240: URL: https://github.com/apache/datafusion/pull/21240#issuecomment-4162286204
> ## Rationale for this change > Previously, DataFusion evaluated uncorrelated scalar subqueries by transforming them into joins. This has two shortcomings: > > 1. Scalar subqueries that return > 1 row were allowed, producing incorrect query results. Such queries should instead result in a runtime error. > 2. Performance. Evaluating scalar subqueries as a join requires going through the join machinery. More importantly, it means that UDFs that have special-cases for scalar inputs cannot use those code paths for scalar subqueries, which often results in significantly slower query execution. > > This PR introduces physical execution of uncorrelated scalar subqueries: > > * Uncorrelated subqueries are left in the plan by the optimizer, not rewritten into joins I am not aware of any database going down this route, for multiple reasons: - you are potentially giving up on many transformations making the plan of the subquery faster (https://github.com/apache/datafusion/pull/21240#issuecomment-4158270781 is one example but it's probably the tip of the iceberg) - alternatively all your planning rules have to deal with subqueries now, but this will make them more complicated, and for some of them it's already challenging to prove correctness: https://github.com/apache/datafusion/issues/21174#issue-4143242322 comes to mind as a tricky correctness issue, and it would make it way more complex to reason over a plan where subqueries are preserved Point 1. is a bug of how subquery removal is implemented, not a limitation of subquery removal algorithms, so it shouldn't be used as a motivation for or against the approach. Point 2. seems a limitation worth addressing for improving the general join path, having most plans benefit, and not a blocker specific to subquery removal, but I must admit that I am not aware of the details of the limitations you mention. This said, my opinion is biased towards the "query planning" side of things, and it might not do justice to the execution perspective you bring up in point 2., but I hope my POV can help with the discussion. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
