[ 
https://issues.apache.org/jira/browse/IMPALA-10910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-10910:
---------------------------------------
    Component/s: Backend

> Iceberg scans don't apply runtime filters at Parquet row group level
> --------------------------------------------------------------------
>
>                 Key: IMPALA-10910
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10910
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>            Reporter: Zoltán Borók-Nagy
>            Priority: Major
>
> From a performance test on TPC-DS 3000 executed by [~rizaon] we noticed that 
> runtime filters are only applied at row level.
> It is known that runtime filters are not applied at file/partition level on 
> Iceberg tables (IMPALA-10453). But they could be applied at Parquet row group 
> level. I think achieving this is much easier than fixing IMPALA-10453.
> E.g. here is a snipped of the runtime profile of q49 of TPC-DS:
> {noformat}
>         Filter 0 (8.00 KB) [108 instances]:
>            - Files processed: 0 (0)
>            - Files rejected: 0 (0)
>            - Files total: 0 (0)
>            - InactiveTotalTime: 0.000ns
>            - RowGroups processed: 0 (0)
>            - RowGroups rejected: 0 (0)
>            - RowGroups total: 0 (0)
>            - Rows processed: 19.34M (19335783)
>            - Rows rejected: 19.32M (19323695)
>            - Rows total: 20.00M (19999711)
>            - Splits processed: 0 (0)
>            - Splits rejected: 0 (0)
>            - Splits total: 0 (0)
>            - TotalTime: 0.000ns
> {noformat}
> We could save a lot of IO by applying the filters at row group level.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to