alamb commented on code in PR #9800:
URL: https://github.com/apache/arrow-datafusion/pull/9800#discussion_r1541124956
##########
datafusion/physical-plan/src/joins/utils.rs:
##########
@@ -968,6 +991,78 @@ fn estimate_inner_join_cardinality(
}
}
+/// Estimates semi join cardinality based on statistics.
+///
+/// The estimation result is either zero, in cases inputs statistics are
non-overlapping
+/// or equal to number of rows for outer input.
+fn estimate_semi_join_cardinality(
Review Comment:
Long term it would be really nice to pull these types of calculations into
some trait (aka an extensibility API)
##########
datafusion/physical-plan/src/joins/utils.rs:
##########
@@ -888,10 +888,45 @@ fn estimate_join_cardinality(
})
}
- JoinType::LeftSemi
- | JoinType::RightSemi
- | JoinType::LeftAnti
- | JoinType::RightAnti => None,
+ JoinType::LeftSemi | JoinType::LeftAnti => {
+ let cardinality = estimate_semi_join_cardinality(
Review Comment:
it doesn't seem correct to me that the same calculation is used for both
Semi and Anti joins (shouldn't they be the inverse of each other?)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]