[
https://issues.apache.org/jira/browse/IMPALA-10898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17500534#comment-17500534
]
Quanlong Huang commented on IMPALA-10898:
-----------------------------------------
I ran a perf-AB-test on TPCDS(42) using orc/snap/block:
https://jenkins.impala.io/job/perf-AB-test/302/
It shows ~10% improvement:
{code:java}
+-----------+--------------------+---------+------------+------------+----------------+
| Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) |
Delta(GeoMean) |
+-----------+--------------------+---------+------------+------------+----------------+
| TPCDS(42) | orc / snap / block | 7.61 | -9.94% | 3.78 | -7.22%
|
+-----------+--------------------+---------+------------+------------+----------------+
{code}
Attached the details report:
[^TPCDS-42-orc-snap-IN-list-filter-performance-result.txt]
The tests are done with --runtime_filter_wait_time_ms=10000.
The baseline commit is f42643276e64944cf0f356b64b40925183514f6f.
The target commit is d56038f4b98b17e0852909c7293874a6975c4b83 which enables
IN-list filter by default.
They are in this branch:
[https://github.com/stiga-huang/incubator-impala/commits/in-list-filter-perf-test]
CC [~drorke], [~rizaon]
> Runtime IN-list filters for ORC tables
> --------------------------------------
>
> Key: IMPALA-10898
> URL: https://issues.apache.org/jira/browse/IMPALA-10898
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Critical
> Attachments: TPCDS-42-orc-snap-IN-list-filter-performance-result.txt
>
>
> Currently Impala has two kinds of runtime filters: bloom filter and min-max
> filter. Unfortunately they can't leverage the bloom filters in ORC files.
> Only EQUALS and IN-list
> predicates can leverage them to skip unrelated ORC RowGroups.
> This JIRA aims to add runtime IN-list filters for small build side (e.g.
> #rows <= 1024) of a hash join.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]