[
https://issues.apache.org/jira/browse/IMPALA-9146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Work on IMPALA-9146 started by Aman Sinha.
------------------------------------------
> Limit the size of the broadcast input on build side of hash join
> ----------------------------------------------------------------
>
> Key: IMPALA-9146
> URL: https://issues.apache.org/jira/browse/IMPALA-9146
> Project: IMPALA
> Issue Type: Bug
> Affects Versions: Impala 3.3.0
> Reporter: Aman Sinha
> Assignee: Aman Sinha
> Priority: Minor
>
> Since broadcast based hash joins are often chosen, we sometimes see very
> large tables being broadcast, with sizes that are larger than the destination
> executor's total memory. This could potentially happen if the cluster
> membership is not accurately known and the planner's cost computation of the
> broadcastCost vs partitionCost happens to favor the broadcast distribution.
> This causes spilling and severely affects performance. Although the
> DistributedPlanner does a mem_limit check before picking broadcast, the
> mem_limit is not an accurate reflection since it is assigned during
> admission control (See
> [IMPALA-988|https://issues.apache.org/jira/browse/IMPALA-988]).
> Given this scenario, as a safety check it is better to have to an explicit
> configurable limit for the size of the broadcast input and set it to a
> reasonable default. The 'reasonable' default can be chosen based on analysis
> of existing benchmark queries and representative workloads where Impala is
> currently used.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]