Zoltán Borók-Nagy created IMPALA-12371:
------------------------------------------
Summary: Add better cardinality estimation for Iceberg V2 tables
with deletes
Key: IMPALA-12371
URL: https://issues.apache.org/jira/browse/IMPALA-12371
Project: IMPALA
Issue Type: Bug
Components: Frontend
Reporter: Zoltán Borók-Nagy
IMPALA-11797 is about the generic case, i.e. better cardinality for all ANTI
JOIN operators.
For Iceberg V2 we can safely come up with a better cardinality estimation as we
can assume that all rows at RHS have a match in LHS when there is no filtering.
Though RHS might contain duplicate rows, see:
https://github.com/apache/iceberg/blob/462a203e67dd42d111a7fd2d3a0090b5aeb80833/api/src/main/java/org/apache/iceberg/RowDelta.java#L132-L133
So we can come up something like this:
Cardinality of DELETE operator = Cardinality(LHS) - (Cardinality(RHS) *
selectivity of LHS)
With some safety checks if it becomes negative (due to duplicates in RHS).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)