korowa commented on code in PR #9800:
URL: https://github.com/apache/arrow-datafusion/pull/9800#discussion_r1543497471
##########
datafusion/physical-plan/src/joins/utils.rs:
##########
@@ -888,10 +888,45 @@ fn estimate_join_cardinality(
})
}
- JoinType::LeftSemi
- | JoinType::RightSemi
- | JoinType::LeftAnti
- | JoinType::RightAnti => None,
+ JoinType::LeftSemi | JoinType::LeftAnti => {
+ let cardinality = estimate_semi_join_cardinality(
Review Comment:
Indeed, they were not correct. I've changed estimations a bit -- now
disjoint statistics affects only semi-joins (filtering outer table should
produce zero rows). For anti-joins, disjoint inputs don't seem to make much
sense -- if statistics are non-overlapping the result will be equal to outer
num_rows side, otherwise (having no info or overlapping statistics) -- it still
will be estimated as outer side, since we know nothing about actual
distribution besides min/max, and assuming that all rows will be filtered out
is too much (may significantly affect further planning)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]