Csaba Ringhofer created IMPALA-12455:
----------------------------------------

             Summary: Create set of disjunct bloom filters for keys in 
partitioned builds
                 Key: IMPALA-12455
                 URL: https://issues.apache.org/jira/browse/IMPALA-12455
             Project: IMPALA
          Issue Type: Improvement
          Components: Backend, Frontend
            Reporter: Csaba Ringhofer


Currently Impala aggregates bloom filters from different instances of the join 
builder by OR-ing them to a final filter. This could be avoided by having 
num_instances smaller bloom filters and choosing the correct one during lookup 
by doing the same hashing as used in partitioning. Builders would only need to 
write a single small filter as they have only keys from a single partition. 
This would make runtime filter producers faster and much more scalable while 
shouldn't have major effect on consumers.

One caveat is that we push down the current bloom filter to Kudu as it is, so 
this optimization wouldn't be applicable in filters consumed by Kudu scans.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to