ethan-tyler commented on code in PR #19884:
URL: https://github.com/apache/datafusion/pull/19884#discussion_r2713448085


##########
datafusion/core/src/physical_planner.rs:
##########
@@ -1907,24 +1907,48 @@ fn get_physical_expr_pair(
 }
 
 /// Extract filter predicates from a DML input plan (DELETE/UPDATE).
-/// Walks the logical plan tree and collects Filter predicates,
-/// splitting AND conjunctions into individual expressions.
+/// Walks the logical plan tree and collects Filter predicates and any filters
+/// pushed down into TableScan nodes, splitting AND conjunctions into 
individual expressions.
 /// Column qualifiers are stripped so expressions can be evaluated against
-/// the TableProvider's schema.
+/// the TableProvider's schema. Deduplicates filters to avoid passing the same
+/// predicate twice when filters appear in both Filter and TableScan nodes.
 ///
 fn extract_dml_filters(input: &Arc<LogicalPlan>) -> Result<Vec<Expr>> {
     let mut filters = Vec::new();
 
     input.apply(|node| {
-        if let LogicalPlan::Filter(filter) = node {
-            // Split AND predicates into individual expressions
-            
filters.extend(split_conjunction(&filter.predicate).into_iter().cloned());
+        match node {
+            LogicalPlan::Filter(filter) => {
+                // Split AND predicates into individual expressions
+                
filters.extend(split_conjunction(&filter.predicate).into_iter().cloned());
+            }
+            LogicalPlan::TableScan(TableScan {
+                filters: scan_filters,
+                ..
+            }) => {
+                for filter in scan_filters {
+                    
filters.extend(split_conjunction(filter).into_iter().cloned());
+                }
+            }
+            _ => {}
         }
         Ok(TreeNodeRecursion::Continue)
     })?;
 
-    // Strip table qualifiers from column references
-    filters.into_iter().map(strip_column_qualifiers).collect()
+    // Strip table qualifiers from column references and deduplicate.
+    // Deduplication is necessary because filters may appear in both Filter 
nodes
+    // and TableScan.filters when the optimizer pushes some predicates down.
+    // We deduplicate by (unqualified) expression to avoid passing the same 
filter twice.
+    let mut seen_filters = HashSet::new();
+    let deduped = filters
+        .into_iter()
+        .map(strip_column_qualifiers)

Review Comment:
   Looks good - thanks for adding target-scan scoping and the mixed 
residual/pushdown coverage.
   
   One remaining concern: `extract_dml_filters` currently drops non-target 
conjuncts (`predicate_is_on_target` check in 
`datafusion/core/src/physical_planner.rs:1941`), and 
`test_update_from_drops_non_target_predicates` 
(`datafusion/core/tests/custom_sources_cases/dml_planning.rs:658`) codifies 
this as expected behavior. That can silently broaden row selection and worst 
case, if the join predicate is the only constraint, this becomes "update all 
rows".
   
   Would it make sense to fail closed instead (planning error if any WHERE 
conjunct references a non-target relation), and update that test to assert the 
error?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to