[ 
https://issues.apache.org/jira/browse/IMPALA-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272187#comment-17272187
 ] 

Zoltán Borók-Nagy commented on IMPALA-10453:
--------------------------------------------

I think min/max filters will do a good service here, filtering out row groups.

I wonder if we put the partition transformed values into the bloom filters then 
they'd be able to prune files using the associated partition data. However, we 
can't do that if the partition layout has been evolved over time. Or we will 
just only prune files that have the current partition layout.

> Support file/partition pruning via runtime filters on Iceberg
> -------------------------------------------------------------
>
>                 Key: IMPALA-10453
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10453
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>            Reporter: Tim Armstrong
>            Priority: Major
>              Labels: iceberg, impala-iceberg, performance
>
> This is a placeholder to figure out what we'd need to do to support dynamic 
> file-level pruning in Iceberg using runtime filters, i.e. have parity for 
> partition pruning.
> * If there is a single partition value per file, then applying bloom filters 
> to the row group stats would be effective at pruning files.
> * If there are partition transforms, e.g. hash-based, then I think we 
> probably need to track the partition that the file is associated with and 
> then have some custom logic in the parquet scanner to do partition pruning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to