[ 
https://issues.apache.org/jira/browse/IMPALA-12455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17777504#comment-17777504
 ] 

Riza Suminto commented on IMPALA-12455:
---------------------------------------

Just found this note that kudu's block bloom filter that Impala use only work 
with FastHash now:

[https://github.com/apache/impala/blob/b15d6dc2e7df05392a1daa4bc1b3da9ca31a583b/be/src/util/bloom-filter.h#L73-L75]
 

> Create set of disjunct bloom filters for keys in partitioned builds
> -------------------------------------------------------------------
>
>                 Key: IMPALA-12455
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12455
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend, Frontend
>            Reporter: Csaba Ringhofer
>            Priority: Major
>              Labels: bloom-filter, performance, runtime-filters
>
> Currently Impala aggregates bloom filters from different instances of the 
> join builder by OR-ing them to a final filter. This could be avoided by 
> having num_instances smaller bloom filters and choosing the correct one 
> during lookup by doing the same hashing as used in partitioning. Builders 
> would only need to write a single small filter as they have only keys from a 
> single partition. This would make runtime filter producers faster and much 
> more scalable while shouldn't have major effect on consumers.
> One caveat is that we push down the current bloom filter to Kudu as it is, so 
> this optimization wouldn't be applicable in filters consumed by Kudu scans.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to