[ 
https://issues.apache.org/jira/browse/IMPALA-12455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767630#comment-17767630
 ] 

Csaba Ringhofer commented on IMPALA-12455:
------------------------------------------

[~rizaon] I assumed that one the fist run we would use the existing min/max 
values for the total size of the filter, not the per partition filter. This 
should consume the same amount of memory on consumers and less memory on 
producers. I think that ffp should not change because of this significantly - 
writing smaller but disjunct filters should have similar ffp to using larger 
ones, but union-ing them.

> Create set of disjunct bloom filters for keys in partitioned builds
> -------------------------------------------------------------------
>
>                 Key: IMPALA-12455
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12455
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend, Frontend
>            Reporter: Csaba Ringhofer
>            Priority: Major
>              Labels: bloom-filter, performance, runtime-filters
>
> Currently Impala aggregates bloom filters from different instances of the 
> join builder by OR-ing them to a final filter. This could be avoided by 
> having num_instances smaller bloom filters and choosing the correct one 
> during lookup by doing the same hashing as used in partitioning. Builders 
> would only need to write a single small filter as they have only keys from a 
> single partition. This would make runtime filter producers faster and much 
> more scalable while shouldn't have major effect on consumers.
> One caveat is that we push down the current bloom filter to Kudu as it is, so 
> this optimization wouldn't be applicable in filters consumed by Kudu scans.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to