nuno-faria commented on code in PR #21715:
URL: https://github.com/apache/datafusion/pull/21715#discussion_r3115841265


##########
datafusion/sqllogictest/test_files/group_by.slt:
##########
@@ -3565,33 +3565,27 @@ SELECT r.sn, r.amount, SUM(r.amount)
   GROUP BY r.sn
 ORDER BY r.sn
 
-# left semi join should propagate constraint of left side as is.
-query IRR
+# left semi join with a nullable UNIQUE key cannot safely propagate the
+# constraint for expansion, because UNIQUE allows multiple NULLs.
+statement error DataFusion error: Error during planning: Column in SELECT must 
be in GROUP BY or an aggregate function: While expanding wildcard, column 
"l\.amount" must appear in the GROUP BY clause or must be part of an aggregate 
function, currently only "l\.sn, sum\(l\.amount\)" appears in the SELECT clause 
satisfies this requirement
 SELECT l.sn, l.amount, SUM(l.amount)
   FROM (SELECT *
     FROM sales_global_with_unique as l
     LEFT SEMI JOIN sales_global_with_unique as r
     ON l.amount >= r.amount + 10)
   GROUP BY l.sn
 ORDER BY l.sn
-----
-1 50 50
-2 75 75
-3 200 200
-4 100 100
-NULL 100 100
 
-# Similarly, left anti join should propagate constraint of left side as is.
-query IRR
+# Similarly, left anti join with a nullable UNIQUE key cannot safely propagate
+# the constraint for expansion.
+statement error DataFusion error: Error during planning: Column in SELECT must 
be in GROUP BY or an aggregate function: While expanding wildcard, column 
"l\.amount" must appear in the GROUP BY clause or must be part of an aggregate 
function, currently only "l\.sn, sum\(l\.amount\)" appears in the SELECT clause 
satisfies this requirement
 SELECT l.sn, l.amount, SUM(l.amount)
   FROM (SELECT *
     FROM sales_global_with_unique as l
     LEFT ANTI JOIN sales_global_with_unique as r
     ON l.amount >= r.amount + 10)
   GROUP BY l.sn
 ORDER BY l.sn

Review Comment:
   My guess is that these tests change due to not calling 
`with_add_implicit_group_by_exprs` anymore. I think this is ok, and will now be 
close to Postgres and DuckDB. But it's probably best to add it to the upgrade 
guide.



##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -2431,7 +2434,14 @@ impl SubqueryAlias {
         // Requalify fields with the new `alias`.
         let fields = plan.schema().fields().clone();
         let meta_data = plan.schema().metadata().clone();
-        let func_dependencies = 
plan.schema().functional_dependencies().clone();
+        // Recursive queries do not expose the anchor's functional 
dependencies to
+        // the outer schema — the recursive term can produce rows that violate
+        // those dependencies, so they are intentionally dropped here.
+        let func_dependencies = if is_recursive_query {
+            FunctionalDependencies::empty()
+        } else {
+            plan.schema().functional_dependencies().clone()
+        };

Review Comment:
   It's not clear to me why is this change necessary for recursive queries? 
Could we add a test case for this?



##########
datafusion/sql/src/select.rs:
##########
@@ -966,12 +968,13 @@ impl<S: ContextProvider> SqlToRel<'_, S> {
         group_by_exprs: &[Expr],
         aggr_exprs: &[Expr],
     ) -> Result<AggregatePlanResult> {
+        let group_by_exprs =
+            self.add_required_group_by_exprs_from_dependencies(input, 
group_by_exprs)?;
+
         // create the aggregate plan
-        let options =
-            
LogicalPlanBuilderOptions::new().with_add_implicit_group_by_exprs(true);
         let plan = LogicalPlanBuilder::from(input.clone())
-            .with_options(options)
-            .aggregate(group_by_exprs.to_vec(), aggr_exprs.to_vec())?

Review Comment:
   I think not adding `with_add_implicit_group_by_exprs` anymore will change 
existing queries. Maybe we should add this to the upgrade guide.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to