[
https://issues.apache.org/jira/browse/IMPALA-11787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zoltán Borók-Nagy resolved IMPALA-11787.
----------------------------------------
Fix Version/s: Impala 4.3.0
Resolution: Fixed
> Cardinality estimate for UNION in Iceberg position-delete plans can double
> the actual table cardinality
> -------------------------------------------------------------------------------------------------------
>
> Key: IMPALA-11787
> URL: https://issues.apache.org/jira/browse/IMPALA-11787
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Reporter: David Rorke
> Assignee: Zoltán Borók-Nagy
> Priority: Major
> Labels: impala-iceberg
> Fix For: Impala 4.3.0
>
> Attachments: q78_iceberg_update_profile.txt.gz
>
>
> The plan for Iceberg tables with position-delete files includes a UNION
> operator that takes the following inputs:
> * LHS: Scan of the data files that don't have corresponding delete files
> * RHS: ANTI JOIN that filters the data files that do have corresponding
> delete files based on the content of the delete files.
> The planner's cardinality estimates for each of these two inputs to the UNION
> can be as large as the full row count of the table (assuming no other
> predicates in the scan) and the planner simply sums these in the UNION which
> can result in a cardinality estimate for the UNION that's twice the size of
> the table. For example here's an excerpt from the TPC-DS query 78 summary
> (the total row count for the full store_sales table is 8.64B and the planner
> estimate of the union cardinality is twice that):
>
> {noformat}
> Operator #Hosts #Inst Avg Time Max Time
> #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
> ...
> | | 04:UNION 10 120 9.168ms 17.416ms
> 1.66B 17.28B 0 0
> | | |--02:DELETE EVENTS HASH JOIN 10 120 138.821us 615.342us
> 0 8.64B 42.12 KB 0 LEFT ANTI JOIN, BROADCAST
> | | | |--F22:JOIN BUILD 10 10 5s741ms 5s944ms
> 3.09 GB 3.07 GB
> | | | | 35:EXCHANGE 10 10 212.271ms 264.374ms
> 12.84M 12.84M 14.98 MB 36.84 MB BROADCAST
> | | | | F12:EXCHANGE SENDER 10 120 30.579ms 40.774ms
> 60.94 KB 0
> | | | | 01:SCAN S3 10 120 75.266ms 417.286ms
> 12.84M 12.84M 468.96 KB 16.00 MB
> tpcds_3000_iceberg_parquet_v2_update_q1_98.store_sales-POSITION-DELETE-01
> tpcds_3000_iceberg_parquet_v2_update_q1_98.store_sales-position-delete
> | | | 00:SCAN S3 10 120 2s645ms 10s064ms
> 0 8.64B 4.01 MB 16.00 MB
> tpcds_3000_iceberg_parquet_v2_update_q1_98.store_sales
> | | 03:SCAN S3 10 120 2s413ms 4s484ms
> 1.66B 8.64B 103.32 MB 88.00 MB
> tpcds_3000_iceberg_parquet_v2_update_q1_98.store_sales
>
> {noformat}
> The planner should account for the fact that each side of the UNION is only
> scanning a subset of the table (so the UNION cardinality can't be greater
> than the actual table size) and also if possible try to estimate the impact
> of the filtering done by the ANTI JOIN.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]