Aman Sinha created IMPALA-9146:
----------------------------------
Summary: Limit the size of the broadcast input on build side of
hash join
Key: IMPALA-9146
URL: https://issues.apache.org/jira/browse/IMPALA-9146
Project: IMPALA
Issue Type: Bug
Affects Versions: Impala 3.3.0
Reporter: Aman Sinha
Assignee: Aman Sinha
Since broadcast based hash joins are often chosen, we sometimes see very large
tables being broadcast, with sizes that are larger than the destination
executor's total memory. This could potentially happen if the cluster
membership is not accurately known and the planner's cost computation of the
broadcastCost vs partitionCost happens to favor the broadcast distribution.
This causes spilling and severely affects performance. Although the
DistributedPlanner does a mem_limit check before picking broadcast, the
mem_limit is not an accurate reflection since it is assigned during admission
control (See [IMPALA-988|https://issues.apache.org/jira/browse/IMPALA-988]).
Given this scenario, as a safety check it is better to have to an explicit
configurable limit for the size of the broadcast input and set it to a
reasonable default. The 'reasonable' default can be chosen based on analysis
of existing benchmark queries and representative workloads where Impala is
currently used.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]