alamb commented on code in PR #9800:
URL: https://github.com/apache/arrow-datafusion/pull/9800#discussion_r1541124956


##########
datafusion/physical-plan/src/joins/utils.rs:
##########
@@ -968,6 +991,78 @@ fn estimate_inner_join_cardinality(
     }
 }
 
+/// Estimates semi join cardinality based on statistics.
+///
+/// The estimation result is either zero, in cases inputs statistics are 
non-overlapping
+/// or equal to number of rows for outer input.
+fn estimate_semi_join_cardinality(

Review Comment:
   Long term it would be really nice to pull these types of calculations into 
some trait (aka an extensibility API)



##########
datafusion/physical-plan/src/joins/utils.rs:
##########
@@ -888,10 +888,45 @@ fn estimate_join_cardinality(
             })
         }
 
-        JoinType::LeftSemi
-        | JoinType::RightSemi
-        | JoinType::LeftAnti
-        | JoinType::RightAnti => None,
+        JoinType::LeftSemi | JoinType::LeftAnti => {
+            let cardinality = estimate_semi_join_cardinality(

Review Comment:
   it doesn't seem correct to me that the same calculation is used for both 
Semi and Anti joins (shouldn't they be the inverse of each other?)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to