[ 
https://issues.apache.org/jira/browse/IMPALA-12455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17815051#comment-17815051
 ] 

Riza Suminto commented on IMPALA-12455:
---------------------------------------

{quote}This would make runtime filter producers faster and much more scalable 
while shouldn't have major effect on consumers.
{quote}
While working on IMPALA-3825, I notice that all join builder nodes tends to 
finish and send their filter update around the same time. This might be due to 
all exchange node inevitably synchronize, waiting on receiving EOS signals from 
all senders below it.
So disjunct bloom filters implementation might be faster due to elimination of 
filter aggregation in coordinator, but the fastest join builder still need to 
wait for the slowest join builder to complete before it can publish its own 
bloom filter.

> Create set of disjunct bloom filters for keys in partitioned builds
> -------------------------------------------------------------------
>
>                 Key: IMPALA-12455
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12455
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend, Frontend
>            Reporter: Csaba Ringhofer
>            Priority: Major
>              Labels: bloom-filter, performance, runtime-filters
>
> Currently Impala aggregates bloom filters from different instances of the 
> join builder by OR-ing them to a final filter. This could be avoided by 
> having num_instances smaller bloom filters and choosing the correct one 
> during lookup by doing the same hashing as used in partitioning. Builders 
> would only need to write a single small filter as they have only keys from a 
> single partition. This would make runtime filter producers faster and much 
> more scalable while shouldn't have major effect on consumers.
> One caveat is that we push down the current bloom filter to Kudu as it is, so 
> this optimization wouldn't be applicable in filters consumed by Kudu scans.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to