kosiew commented on code in PR #20231:
URL: https://github.com/apache/datafusion/pull/20231#discussion_r2870577575


##########
datafusion/physical-plan/src/filter.rs:
##########
@@ -2066,4 +2097,46 @@ mod tests {
 
         Ok(())
     }
+
+    /// Regression test for https://github.com/apache/datafusion/issues/20194
+    ///
+    /// `collect_columns_from_predicate_inner` should only extract equality
+    /// pairs where at least one side is a Column. Pairs like
+    /// `complex_expr = literal` must not create equivalence classes because
+    /// `normalize_expr`'s deep traversal would replace the literal inside
+    /// unrelated expressions (e.g. sort keys) with the complex expression.
+    #[tokio::test]
+    async fn test_collect_columns_skips_non_column_pairs() -> Result<()> {

Review Comment:
   `test_collect_columns_skips_non_column_pairs` does not use async behavior.
   
   Could this be a plain `#[test]`? Keeping it sync makes scope a bit clearer 
since no async execution is involved.



##########
datafusion/physical-plan/src/filter.rs:
##########
@@ -979,6 +998,18 @@ fn collect_columns_from_predicate_inner(
     let predicates = split_conjunction(predicate);
     predicates.into_iter().for_each(|p| {
         if let Some(binary) = p.as_any().downcast_ref::<BinaryExpr>() {
+            // Only extract pairs where at least one side is a Column 
reference.
+            // Pairs like `complex_expr = literal` should not create 
equivalence
+            // classes — the literal could appear in many unrelated expressions
+            // (e.g. sort keys), and normalize_expr's deep traversal would
+            // replace those occurrences with the complex expression, 
corrupting
+            // sort orderings. Constant propagation for such pairs is handled
+            // separately by `extend_constants`.
+            let has_column = 
binary.left().as_any().downcast_ref::<Column>().is_some()

Review Comment:
   `has_column` currently means "one side is directly a `Column` expression," 
not "expression contains a column."
   
   Would `has_direct_column_operand` (or similar) be clearer here? It matches 
the intentional exclusion of complex_expr=literal pairs and avoids misreading 
this as recursive column detection.



##########
datafusion/physical-plan/src/filter.rs:
##########
@@ -359,18 +359,37 @@ impl FilterExec {
             if let Some(binary) = 
conjunction.as_any().downcast_ref::<BinaryExpr>()
                 && binary.op() == &Operator::Eq
             {
-                // Filter evaluates to single value for all partitions
-                if input_eqs.is_expr_constant(binary.left()).is_some() {
-                    let across = input_eqs
-                        .is_expr_constant(binary.right())
-                        .unwrap_or_default();
+                // Check if either side is constant — either already known
+                // constant from the input equivalence properties, or a literal
+                // value (which is inherently constant across all partitions).
+                let left_const =
+                    input_eqs.is_expr_constant(binary.left()).or_else(|| {
+                        binary
+                            .left()
+                            .as_any()
+                            .downcast_ref::<Literal>()
+                            .map(|l| 
AcrossPartitions::Uniform(Some(l.value().clone())))
+                    });
+                let right_const =
+                    input_eqs.is_expr_constant(binary.right()).or_else(|| {

Review Comment:
   Current logic repeats nearly identical 
`is_expr_constant(...).or_else(literal...)` blocks for left/right operands.
   
   Could we extract an `expr_constant_or_literal(expr, input_eqs)` helper here? 
It would remove duplication and centralize literal/const semantics used by 
filter equivalence inference.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to