Zoltán Borók-Nagy created IMPALA-10910:
------------------------------------------
Summary: Iceberg scans don't apply runtime filters at Parquet row
group level
Key: IMPALA-10910
URL: https://issues.apache.org/jira/browse/IMPALA-10910
Project: IMPALA
Issue Type: Bug
Reporter: Zoltán Borók-Nagy
>From a performance test on TPC-DS 3000 executed by [~rizaon] we noticed that
>runtime filters are only applied at row level.
It is known that runtime filters are not applied at file/partition level on
Iceberg tables (IMPALA-10453). But they could be applied at Parquet row group
level. I think achieving this is much easier than fixing IMPALA-10453.
E.g. here is a snipped of the runtime profile of q49 of TPC-DS:
{noformat}
Filter 0 (8.00 KB) [108 instances]:
- Files processed: 0 (0)
- Files rejected: 0 (0)
- Files total: 0 (0)
- InactiveTotalTime: 0.000ns
- RowGroups processed: 0 (0)
- RowGroups rejected: 0 (0)
- RowGroups total: 0 (0)
- Rows processed: 19.34M (19335783)
- Rows rejected: 19.32M (19323695)
- Rows total: 20.00M (19999711)
- Splits processed: 0 (0)
- Splits rejected: 0 (0)
- Splits total: 0 (0)
- TotalTime: 0.000ns
{noformat}
We could save a lot of IO by applying the filters at row group level.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]