alamb commented on code in PR #13039: URL: https://github.com/apache/datafusion/pull/13039#discussion_r1809481258
########## datafusion/expr/src/logical_plan/plan.rs: ########## @@ -2889,7 +2890,9 @@ impl DistinctOn { // Check that the left-most sort expressions are the same as the `ON` expressions. let mut matched = true; for (on, sort) in self.on_expr.iter().zip(sort_expr.iter()) { - if on != &sort.expr { + if normalize_col(on.clone(), &self.input)? Review Comment: It seems like the `sort` is already normalized; https://github.com/apache/datafusion/blob/2e274bfbdfc19f11b2951456bb2a48e88733c9bf/datafusion/expr/src/expr_rewriter/mod.rs#L127-L128 Likewise isn't `self.on_exprs` is also already normalized ? https://github.com/apache/datafusion/blob/c3a919a9b87646c531f96e39e7217c11480b3118/datafusion/expr/src/logical_plan/plan.rs#L2858-L2859 ########## datafusion/expr/src/logical_plan/plan.rs: ########## @@ -3870,6 +3873,39 @@ digraph { ); } + #[test] Review Comment: Could you also please add a sqllogictest? Perhaps by extending https://github.com/apache/datafusion/blob/main/datafusion/sqllogictest/test_files/distinct_on.slt ? ########## datafusion/expr/src/logical_plan/plan.rs: ########## @@ -3870,6 +3873,39 @@ digraph { ); } + #[test] + fn distinct_on_expr_order_by_mismatch() -> Result<()> { + let schema = Schema::new(vec![ + Field::new("a", DataType::Int32, false), + Field::new("b", DataType::Int32, false), + ]); + + let table_scan = table_scan(TableReference::none(), &schema, None)?.build()?; + let p = DistinctOn::try_new( + vec![col("a")], + vec![], + Some(vec![SortExpr::new(col("b"), true, false)]), + Arc::new(table_scan), + ); + assert_eq!(p.err().unwrap().strip_backtrace(), "Error during planning: SELECT DISTINCT ON expressions must match initial ORDER BY expressions"); + Ok(()) + } + + #[test] + fn distinct_on_expr_order_by_match() -> Result<()> { Review Comment: When I revert the code changes in this PR, these tests still pass 🤔 -- so I am not sure this PR solves the issue ```diff diff --git a/datafusion/expr/src/logical_plan/plan.rs b/datafusion/expr/src/logical_plan/plan.rs index 6f4e78ccf..9d7d893be 100644 --- a/datafusion/expr/src/logical_plan/plan.rs +++ b/datafusion/expr/src/logical_plan/plan.rs @@ -28,8 +28,7 @@ use super::DdlStatement; use crate::builder::{change_redundant_column, unnest_with_options}; use crate::expr::{Placeholder, Sort as SortExpr, WindowFunction}; use crate::expr_rewriter::{ - create_col_from_scalar_expr, normalize_col, normalize_cols, normalize_sorts, - NamePreserver, + create_col_from_scalar_expr, normalize_cols, normalize_sorts, NamePreserver, }; use crate::logical_plan::display::{GraphvizVisitor, IndentVisitor}; use crate::logical_plan::extension::UserDefinedLogicalNode; @@ -2890,9 +2889,7 @@ impl DistinctOn { // Check that the left-most sort expressions are the same as the `ON` expressions. let mut matched = true; for (on, sort) in self.on_expr.iter().zip(sort_expr.iter()) { - if normalize_col(on.clone(), &self.input)? - != normalize_col(sort.expr.clone(), &self.input)? - { + if on != &sort.expr { matched = false; break; } ``` ```shell (venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion$ cargo test -p datafusion-expr -- distinct_on Compiling datafusion-expr v42.1.0 (/Users/andrewlamb/Software/datafusion/datafusion/expr) warning: unused variable: `p` --> datafusion/expr/src/logical_plan/plan.rs:3897:13 | 3897 | let p = DistinctOn::try_new( | ^ help: if this is intentional, prefix it with an underscore: `_p` | = note: `#[warn(unused_variables)]` on by default warning: `datafusion-expr` (lib test) generated 1 warning Finished `test` profile [unoptimized + debuginfo] target(s) in 2.99s Running unittests src/lib.rs (target/debug/deps/datafusion_expr-3be1ac638c9d6fb5) running 2 tests test logical_plan::plan::tests::distinct_on_expr_order_by_match ... ok test logical_plan::plan::tests::distinct_on_expr_order_by_mismatch ... ok test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 107 filtered out; finished in 0.00s Doc-tests datafusion_expr running 0 tests test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 45 filtered out; finished in 0.01s ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org