berkaysynnada commented on code in PR #12979:
URL: https://github.com/apache/datafusion/pull/12979#discussion_r1806062077
##########
datafusion/expr/src/logical_plan/builder.rs:
##########
@@ -1402,7 +1402,12 @@ pub fn validate_unique_names<'a>(
pub fn union(left_plan: LogicalPlan, right_plan: LogicalPlan) ->
Result<LogicalPlan> {
// Temporarily use the schema from the left input and later rely on the
analyzer to
// coerce the two schemas into a common one.
- let schema = Arc::clone(left_plan.schema());
+
+ // Functional Dependencies doesn't preserve after UNION operation
+ let schema = (**left_plan.schema()).clone();
+ let schema =
+
Arc::new(schema.with_functional_dependencies(FunctionalDependencies::empty())?);
+
Review Comment:
Is clearing out all dependencies the right fix? Could we retain some if they
do not harm?
##########
datafusion/core/src/dataframe/mod.rs:
##########
@@ -2623,6 +2623,54 @@ mod tests {
Ok(())
}
+ #[tokio::test]
+ async fn test_aggregate_with_union() -> Result<()> {
+ let df = test_table().await?;
+
+ let df1 = df
+ .clone()
+ // GROUP BY `c1`
+ .aggregate(vec![col("c1")], vec![min(col("c2"))])?
+ // SELECT `c1` , min(c2) as `result`
+ .select(vec![col("c1"), min(col("c2")).alias("result")])?;
+ let df2 = df
+ .clone()
+ // GROUP BY `c1`
+ .aggregate(vec![col("c1")], vec![max(col("c3"))])?
+ // SELECT `c1` , max(c3) as `result`
+ .select(vec![col("c1"), max(col("c3")).alias("result")])?;
+
+ let df_union = df1.union(df2)?;
+ let df = df_union
+ // GROUP BY `c1`
+ .aggregate(
+ vec![col("c1")],
+ vec![sum(col("result")).alias("sum_result")],
+ )?
+ // SELECT `c1`, sum(result) as `sum_result`
+ .select(vec![(col("c1")), col("sum_result")])?;
+
+ let df_results = df.collect().await?;
+
+ #[rustfmt::skip]
+ assert_batches_sorted_eq!(
+ [
+ "+----+------------+",
+ "| c1 | sum_result |",
+ "+----+------------+",
+ "| a | 84 |",
+ "| b | 69 |",
+ "| c | 124 |",
+ "| d | 126 |",
+ "| e | 121 |",
+ "+----+------------+"
+ ],
+ &df_results
+ );
+
+ Ok(())
+ }
+
Review Comment:
I tried this test before this PR, and it fails as @Sevenannn explained. The
current result matches what postgre provides
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]