jonathanc-n commented on code in PR #17153:
URL: https://github.com/apache/datafusion/pull/17153#discussion_r2277796066


##########
datafusion/physical-plan/src/joins/hash_join.rs:
##########
@@ -369,6 +370,106 @@ pub struct HashJoinExec {
     dynamic_filter: Arc<DynamicFilterPhysicalExpr>,
 }
 
+/// Check if a physical expression only references columns from the left child
+fn is_left_only_predicate(
+    predicate: &Arc<dyn PhysicalExpr>,
+    left_schema_len: usize,
+) -> bool {
+    let columns = collect_columns(predicate);
+    // All column indices must be less than left_schema_len
+    columns.iter().all(|col| col.index() < left_schema_len)
+}
+
+/// Create child filter descriptions that respect join semantics
+///
+/// This function analyzes parent filters and determines which ones can be 
pushed
+/// to each child based on the join type's preservation properties and column 
analysis.
+fn create_join_aware_child_descriptions(
+    parent_filters: &[Arc<dyn PhysicalExpr>],
+    left_child: &Arc<dyn ExecutionPlan>,
+    right_child: &Arc<dyn ExecutionPlan>,
+    join_type: JoinType,
+) -> Result<(
+    crate::filter_pushdown::ChildFilterDescription,
+    crate::filter_pushdown::ChildFilterDescription,
+)> {
+    let (left_preserved, right_preserved) = join_type.lr_is_preserved();
+    let left_schema_len = left_child.schema().fields().len();
+
+    // For left child: apply schema analysis + join-specific rules
+    let mut left_parent_filters = Vec::new();
+    // For right child: apply schema analysis + join-specific rules
+    let mut right_parent_filters = Vec::new();
+
+    for filter in parent_filters {
+        // Left child analysis
+        if left_preserved && is_left_only_predicate(filter, left_schema_len) {
+            // Check if the filter can actually be pushed based on schema
+            let temp_left_desc =
+                crate::filter_pushdown::ChildFilterDescription::from_child(
+                    &[Arc::clone(filter)],
+                    left_child,
+                )?;
+
+            if let Some(first_filter) = temp_left_desc.parent_filters.first() {
+                left_parent_filters.push(first_filter.clone());
+            } else {
+                left_parent_filters.push(
+                    
crate::filter_pushdown::PushedDownPredicate::unsupported(Arc::clone(
+                        filter,
+                    )),
+                );
+            }
+        } else {
+            left_parent_filters.push(
+                
crate::filter_pushdown::PushedDownPredicate::unsupported(Arc::clone(
+                    filter,
+                )),
+            );
+        }
+
+        // Right child analysis
+        if right_preserved && !is_left_only_predicate(filter, left_schema_len) 
{

Review Comment:
   I'm assuming so, or we end up pushing mixed predicates into the `from_child` 
call, which is unnecessary



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to