Wenzhe Zhou created IMPALA-9789:
-----------------------------------

             Summary: Disable ineffective bloom filters for Kudu scan
                 Key: IMPALA-9789
                 URL: https://issues.apache.org/jira/browse/IMPALA-9789
             Project: IMPALA
          Issue Type: Improvement
          Components: Backend, Frontend
    Affects Versions: Impala 3.4.0
            Reporter: Wenzhe Zhou
            Assignee: Wenzhe Zhou
             Fix For: Impala 4.0


In bloom-filter benchmark for Kudu, there is performance regression for query 
TPCH-Q9. In Profile shows that 5 bloom filters are generated by hash join. Some 
of those filters are not useful for filtering rows. When pushing all bloom 
filters to Kudu, the bloom filter evaluations add extra cost for Kudu scan, 
which cause performance regression.
 
The regression on Q9 looks a lot like 
https://issues.apache.org/jira/browse/IMPALA-9302, where Q9 regressed a lot 
with multithreading initially because ineffective filters weren't being 
disabled. This query is a bit special in that there are many filters pushed to 
scan 2, and most of them are not useful. Based on our experience there, we need 
to add a method to disable ineffective filters for Kudu scan.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to