alamb commented on code in PR #13039:
URL: https://github.com/apache/datafusion/pull/13039#discussion_r1809481258
##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -2889,7 +2890,9 @@ impl DistinctOn {
// Check that the left-most sort expressions are the same as the `ON`
expressions.
let mut matched = true;
for (on, sort) in self.on_expr.iter().zip(sort_expr.iter()) {
- if on != &sort.expr {
+ if normalize_col(on.clone(), &self.input)?
Review Comment:
It seems like the `sort` is already normalized;
https://github.com/apache/datafusion/blob/2e274bfbdfc19f11b2951456bb2a48e88733c9bf/datafusion/expr/src/expr_rewriter/mod.rs#L127-L128
Likewise isn't `self.on_exprs` is also already normalized ?
https://github.com/apache/datafusion/blob/c3a919a9b87646c531f96e39e7217c11480b3118/datafusion/expr/src/logical_plan/plan.rs#L2858-L2859
##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -3870,6 +3873,39 @@ digraph {
);
}
+ #[test]
Review Comment:
Could you also please add a sqllogictest? Perhaps by extending
https://github.com/apache/datafusion/blob/main/datafusion/sqllogictest/test_files/distinct_on.slt
?
##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -3870,6 +3873,39 @@ digraph {
);
}
+ #[test]
+ fn distinct_on_expr_order_by_mismatch() -> Result<()> {
+ let schema = Schema::new(vec![
+ Field::new("a", DataType::Int32, false),
+ Field::new("b", DataType::Int32, false),
+ ]);
+
+ let table_scan = table_scan(TableReference::none(), &schema,
None)?.build()?;
+ let p = DistinctOn::try_new(
+ vec![col("a")],
+ vec![],
+ Some(vec![SortExpr::new(col("b"), true, false)]),
+ Arc::new(table_scan),
+ );
+ assert_eq!(p.err().unwrap().strip_backtrace(), "Error during planning:
SELECT DISTINCT ON expressions must match initial ORDER BY expressions");
+ Ok(())
+ }
+
+ #[test]
+ fn distinct_on_expr_order_by_match() -> Result<()> {
Review Comment:
When I revert the code changes in this PR, these tests still pass 🤔 -- so I
am not sure this PR solves the issue
```diff
diff --git a/datafusion/expr/src/logical_plan/plan.rs
b/datafusion/expr/src/logical_plan/plan.rs
index 6f4e78ccf..9d7d893be 100644
--- a/datafusion/expr/src/logical_plan/plan.rs
+++ b/datafusion/expr/src/logical_plan/plan.rs
@@ -28,8 +28,7 @@ use super::DdlStatement;
use crate::builder::{change_redundant_column, unnest_with_options};
use crate::expr::{Placeholder, Sort as SortExpr, WindowFunction};
use crate::expr_rewriter::{
- create_col_from_scalar_expr, normalize_col, normalize_cols,
normalize_sorts,
- NamePreserver,
+ create_col_from_scalar_expr, normalize_cols, normalize_sorts,
NamePreserver,
};
use crate::logical_plan::display::{GraphvizVisitor, IndentVisitor};
use crate::logical_plan::extension::UserDefinedLogicalNode;
@@ -2890,9 +2889,7 @@ impl DistinctOn {
// Check that the left-most sort expressions are the same as the
`ON` expressions.
let mut matched = true;
for (on, sort) in self.on_expr.iter().zip(sort_expr.iter()) {
- if normalize_col(on.clone(), &self.input)?
- != normalize_col(sort.expr.clone(), &self.input)?
- {
+ if on != &sort.expr {
matched = false;
break;
}
```
```shell
(venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion$ cargo test -p
datafusion-expr -- distinct_on
Compiling datafusion-expr v42.1.0
(/Users/andrewlamb/Software/datafusion/datafusion/expr)
warning: unused variable: `p`
--> datafusion/expr/src/logical_plan/plan.rs:3897:13
|
3897 | let p = DistinctOn::try_new(
| ^ help: if this is intentional, prefix it with an
underscore: `_p`
|
= note: `#[warn(unused_variables)]` on by default
warning: `datafusion-expr` (lib test) generated 1 warning
Finished `test` profile [unoptimized + debuginfo] target(s) in 2.99s
Running unittests src/lib.rs
(target/debug/deps/datafusion_expr-3be1ac638c9d6fb5)
running 2 tests
test logical_plan::plan::tests::distinct_on_expr_order_by_match ... ok
test logical_plan::plan::tests::distinct_on_expr_order_by_mismatch ... ok
test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 107 filtered
out; finished in 0.00s
Doc-tests datafusion_expr
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 45 filtered out;
finished in 0.01s
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]