[jira] [Resolved] (IMPALA-3825) Distribute runtime filter aggregation across cluster

Riza Suminto (Jira) Wed, 20 Dec 2023 07:59:04 -0800


     [ 
https://issues.apache.org/jira/browse/IMPALA-3825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Riza Suminto resolved IMPALA-3825.
----------------------------------
     Fix Version/s: Impala 4.4.0
    Target Version: Impala 4.4.0  (was: Product Backlog)
        Resolution: Fixed

> Distribute runtime filter aggregation across cluster
> ----------------------------------------------------
>
>                 Key: IMPALA-3825
>                 URL: https://issues.apache.org/jira/browse/IMPALA-3825
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Distributed Exec
>    Affects Versions: Impala 2.6.0
>            Reporter: Henry Robinson
>            Assignee: Riza Suminto
>            Priority: Major
>              Labels: runtime-filters, scalability
>             Fix For: Impala 4.4.0
>
>
> Runtime filters can be tens of MB or more, and incasting all filters from all 
> shuffle joins to the coordinator can put a lot of memory pressure on that 
> node. To alleviate this we should consider spreading out the aggregation 
> operation across the cluster, so that a different node aggregates each 
> runtime filter.
> This still restricts aggregation to #runtime-filters nodes, which will 
> usually be less than the cluster size. If we want to smooth that out further 
> we could use tree-based aggregation, but let's measure the benefits of simply 
> distributing the aggregation work first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (IMPALA-3825) Distribute runtime filter aggregation across cluster

Reply via email to