[ 
https://issues.apache.org/jira/browse/IMPALA-9146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-9146 started by Aman Sinha.
------------------------------------------
> Limit the size of the broadcast input on build side of hash join
> ----------------------------------------------------------------
>
>                 Key: IMPALA-9146
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9146
>             Project: IMPALA
>          Issue Type: Bug
>    Affects Versions: Impala 3.3.0
>            Reporter: Aman Sinha
>            Assignee: Aman Sinha
>            Priority: Minor
>
> Since broadcast based hash joins are often chosen, we sometimes see very 
> large tables being broadcast, with sizes that are larger than the destination 
> executor's total memory.  This could potentially happen if the cluster 
> membership is not accurately known and the planner's cost computation of the 
> broadcastCost vs partitionCost happens to favor the broadcast distribution.  
> This causes spilling and severely affects performance.  Although the 
> DistributedPlanner does a mem_limit check before picking broadcast, the 
> mem_limit is not an accurate reflection since  it is assigned during 
> admission control (See 
> [IMPALA-988|https://issues.apache.org/jira/browse/IMPALA-988]).   
> Given this scenario, as a safety check it is better to have to an explicit 
> configurable limit for the size of the broadcast input and set it to a 
> reasonable default.  The 'reasonable' default can be chosen based on analysis 
> of existing benchmark queries and representative workloads where Impala is 
> currently used. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to