queries for Iceberg tables

Jira Wed, 17 Dec 2025 02:43:22 -0800


    [ 
https://issues.apache.org/jira/browse/IMPALA-11986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18045794#comment-18045794
 ]


Noémi Pap-Takács commented on IMPALA-11986:
-------------------------------------------

This could be done based on the logic that the partition table introduced in 
IMPALA-13267, and after some changes expected in IMPALA-14564.

> Optimize MIN(part_col)/ MAX(part_col)/ COUNT(DISTINCT part_col)/ queries for 
> Iceberg tables
> -------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-11986
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11986
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>            Reporter: Li Penglin
>            Assignee: Noémi Pap-Takács
>            Priority: Major
>              Labels: impala-iceberg, performance
>
> For Iceberg V1 and V2 tables without deletes:
> [https://impala.apache.org/docs/build/html/topics/impala_optimize_partition_key_scans.html]
>  OPTIMIZE_PARTITION_KEY_SCANS optimizes the MIN(key_column), MAX(key_column), 
> and COUNT(DISTINCT key_column) by 'TBLS' table and 'PARTITION_KEY_VALS' 
> partition key column in the HMS metadata. For the Iceberg tables, its 
> partitioning stats is not stored in the HMS, but can be obtained through the 
> Iceberg API. We can optimize query performance for MIN(key_column), 
> MAX(key_column), or COUNT(DISTINCT key_column) by similar idea, but we should 
> make sure that 'Partition Transforms' is 'identity'.
> For non-partitioned columns, if min-max information is stored in Iceberg 
> meta, the MIN(column) and MAX(column) queries can also be optimized based on 
> this idea?
> But impala does not guarantee that the statistics for these non-partitioned 
> columns are complete, it's confusing things.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-11986) Optimize MIN(part_col)/ MAX(part_col)/ COUNT(DISTINCT part_col)/ queries for Iceberg tables

Reply via email to