[
https://issues.apache.org/jira/browse/IMPALA-12455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767679#comment-17767679
]
Riza Suminto commented on IMPALA-12455:
---------------------------------------
Making filter producer to send directly to consumer without going through
coordinator will necessitate all executors to have the same backend_states_ and
addr_to_backend_state_. Currently, these informations are centralized in
coordinator:
[https://github.com/apache/impala/blob/4d15558b5eaa69e872917c8bbf69dc1dc2146bc5/be/src/runtime/coordinator.h#L295-L304]
> Create set of disjunct bloom filters for keys in partitioned builds
> -------------------------------------------------------------------
>
> Key: IMPALA-12455
> URL: https://issues.apache.org/jira/browse/IMPALA-12455
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend, Frontend
> Reporter: Csaba Ringhofer
> Priority: Major
> Labels: bloom-filter, performance, runtime-filters
>
> Currently Impala aggregates bloom filters from different instances of the
> join builder by OR-ing them to a final filter. This could be avoided by
> having num_instances smaller bloom filters and choosing the correct one
> during lookup by doing the same hashing as used in partitioning. Builders
> would only need to write a single small filter as they have only keys from a
> single partition. This would make runtime filter producers faster and much
> more scalable while shouldn't have major effect on consumers.
> One caveat is that we push down the current bloom filter to Kudu as it is, so
> this optimization wouldn't be applicable in filters consumed by Kudu scans.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]