[ 
https://issues.apache.org/jira/browse/IMPALA-10898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17500534#comment-17500534
 ] 

Quanlong Huang commented on IMPALA-10898:
-----------------------------------------

I ran a perf-AB-test on TPCDS(42) using orc/snap/block: 
https://jenkins.impala.io/job/perf-AB-test/302/
It shows ~10% improvement:
{code:java}
+-----------+--------------------+---------+------------+------------+----------------+
| Workload  | File Format        | Avg (s) | Delta(Avg) | GeoMean(s) | 
Delta(GeoMean) |
+-----------+--------------------+---------+------------+------------+----------------+
| TPCDS(42) | orc / snap / block | 7.61    | -9.94%     | 3.78       | -7.22%   
      |
+-----------+--------------------+---------+------------+------------+----------------+
{code}
Attached the details report: 
[^TPCDS-42-orc-snap-IN-list-filter-performance-result.txt]

The tests are done with --runtime_filter_wait_time_ms=10000.
The baseline commit is f42643276e64944cf0f356b64b40925183514f6f.
The target commit is d56038f4b98b17e0852909c7293874a6975c4b83 which enables 
IN-list filter by default.
They are in this branch: 
[https://github.com/stiga-huang/incubator-impala/commits/in-list-filter-perf-test]

CC [~drorke], [~rizaon]

> Runtime IN-list filters for ORC tables
> --------------------------------------
>
>                 Key: IMPALA-10898
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10898
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>         Attachments: TPCDS-42-orc-snap-IN-list-filter-performance-result.txt
>
>
> Currently Impala has two kinds of runtime filters: bloom filter and min-max 
> filter. Unfortunately they can't leverage the bloom filters in ORC files. 
> Only EQUALS and IN-list 
> predicates can leverage them to skip unrelated ORC RowGroups.
> This JIRA aims to add runtime IN-list filters for small build side (e.g. 
> #rows <= 1024) of a hash join.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to