alamb commented on code in PR #22530:
URL: https://github.com/apache/datafusion/pull/22530#discussion_r3320643498
##########
datafusion/common/src/config.rs:
##########
@@ -1124,6 +1124,21 @@ config_namespace! {
/// into the file scan phase.
pub enable_topk_dynamic_filter_pushdown: bool, default = true
+ /// When set to true, uncorrelated scalar subqueries are
+ /// left in the logical plan and executed by `ScalarSubqueryExec`
during
+ /// physical execution. When set to false, all scalar subqueries
+ /// (including uncorrelated ones) are rewritten to left joins by the
+ /// `ScalarSubqueryToJoin` optimizer rule.
+ ///
+ /// Note disabling this option is not recommended. It restores
+ /// pre-PR-21240 behavior, which silently produces incorrect results
for
+ /// multi-row subqueries and does not support scalar subqueries in
+ /// ORDER BY / JOIN ON / aggregate-function arguments. This option is
+ /// intended as a temporary escape hatch for distributed execution
+ /// frameworks and is planned to be removed in a future DataFusion
+ /// release.
Review Comment:
As a nit it would be nice to add a link to the PR
- https://github.com/apache/datafusion/pull/21240
```suggestion
/// Note disabling this option is not recommended. It restores
/// pre https://github.com/apache/datafusion/pull/21240
/// behavior, which silently produces incorrect results for
/// multi-row subqueries and does not support scalar subqueries in
/// ORDER BY / JOIN ON / aggregate-function arguments. This option is
/// intended as a temporary escape hatch for distributed execution
/// frameworks and is planned to be removed in a future DataFusion
/// release.
```
##########
datafusion/common/src/config.rs:
##########
@@ -1124,6 +1124,21 @@ config_namespace! {
/// into the file scan phase.
pub enable_topk_dynamic_filter_pushdown: bool, default = true
+ /// When set to true, uncorrelated scalar subqueries are
+ /// left in the logical plan and executed by `ScalarSubqueryExec`
during
+ /// physical execution. When set to false, all scalar subqueries
+ /// (including uncorrelated ones) are rewritten to left joins by the
+ /// `ScalarSubqueryToJoin` optimizer rule.
+ ///
Review Comment:
For my understanding, if this flag is enabled does it
1. restore DataFusion 53 behavior (which can be wrong in some cases)
2. Introduce some new ways for incorrect results?
##########
datafusion/optimizer/src/scalar_subquery_to_join.rs:
##########
@@ -1159,4 +1197,53 @@ mod tests {
"
)
}
+
+ #[test]
+ fn uncorrelated_scalar_subquery_rewritten_when_flag_off() -> Result<()> {
+ use datafusion_common::config::ConfigOptions;
+
+ let sq = Arc::new(
+ LogicalPlanBuilder::from(scan_tpch_table("orders"))
+ .aggregate(Vec::<Expr>::new(),
vec![max(col("orders.o_custkey"))])?
+ .project(vec![max(col("orders.o_custkey"))])?
+ .build()?,
+ );
+
+ let plan = LogicalPlanBuilder::from(scan_tpch_table("customer"))
+ .filter(col("customer.c_custkey").eq(scalar_subquery(sq)))?
+ .project(vec![col("customer.c_custkey")])?
+ .build()?;
+
+ let mut options = ConfigOptions::default();
+ options.optimizer.filter_null_join_keys = true;
Review Comment:
Is this setting needed? Or maybe it is just copy/paste
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]