[jira] [Updated] (IMPALA-11802) Optimize count(*) queries for Iceberg V2 tables

Gabor Kaszab (Jira) Mon, 24 Apr 2023 05:29:05 -0700


     [ 
https://issues.apache.org/jira/browse/IMPALA-11802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Gabor Kaszab updated IMPALA-11802:
----------------------------------
        Parent: IMPALA-12087
    Issue Type: Sub-task  (was: Bug)

> Optimize count(*) queries for Iceberg V2 tables
> -----------------------------------------------
>
>                 Key: IMPALA-11802
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11802
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Frontend
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Li Penglin
>            Priority: Major
>              Labels: impala-iceberg
>             Fix For: Impala 4.3.0
>
>
> Simple {{SELECT count( * ) FROM ice_v2_tbl;}} could be optimized.
> At first we need to investigate if the following is true:
> If a V2 table only has position delete files, then the cardinality is
> {noformat}
> Cardinality(data files) - Cardinality(delete files)
> {noformat}
> If this is true, then we can answer count( * ) queries via a query rewrite 
> similarly to what we do for V1 tables: IMPALA-11279
> If the above is not true, we can still optimize count( * ) queries by:
> {noformat}
>         SUM
>          |
>      UNION ALL
>       /     \
>      /       \
>     /         \
> COUNT(*)     COUNT(*)
>   /                \
> SCAN             ANTI JOIN
> data files         /      \
> without           /        \
> deletes       SCAN         SCAN
>               data files   delete files
>               with deletes
> {noformat}
> The SCAN operator with "data files without deletes" could benefit from count( 
> * ) optimization (they would only need to read file metadata). In the common 
> case (when there are few deletes) this SCAN is in charge of scanning the vast 
> majority of data files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (IMPALA-11802) Optimize count(*) queries for Iceberg V2 tables

Reply via email to