neilconway commented on PR #21240: URL: https://github.com/apache/datafusion/pull/21240#issuecomment-4162738604
@asolimando Thanks for the thoughtful comment! > I am not aware of any database going down this route This technique is widely used: Postgres does almost exactly what this PR implements, and MySQL and Oracle both do very similar things. "Evaluate once and cache" is a very common approach for evaluating uncorrelated scalar subqueries, although when/how the evaluation and caching happens differs (e.g., I believe CockroackDB does the subquery evaluation during query planning). On point 2: leaving uncorrelated subqueries in the logical plan mean some plan rewriting rules will need to handle them, that is true. Most optimization rules don't need changes: of the ~23 current rules, so far we've had to modify 3 of them (that do custom recursion via `apply_order() = None`), which had to be updated to recurse into subqueries as appropriate. In general, uncorrelated scalar subqueries are such a simple construct (single return value, no dependency between the subquery and the parent plan) that I think the query rewriting concerns shouldn't be too bad. I think the situation would be different if we were talking about preserving a broader class of subqueries. On the EnforceDistribution issue specifically, based on a quick read I couldn't see how preserving uncorrelated scalar subqueries in logical plans would make the bug any harder (or easier) to resolve. If you have something specific in mind here, can you elaborate? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
