[ 
https://issues.apache.org/jira/browse/SPARK-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zongheng Yang updated SPARK-2393:
---------------------------------

    Description: 
Spark SQL should support the common optimization known as cost estimations. For 
example, each logical operator should be able to estimate its cardinality, 
based on the estimates from its children. 

As a first step, the framework to support doing so should be added, which might 
include the interface for the aforementioned cardinality estimation, and/or 
some other metrics.

As the first proof of concept usage of this optimization, a simple optimization 
strategy for certain equi-joins should be added: namely, if one side of a 
qualifying join has a small estimated physical size (smaller than some 
threshold), the planner should use a broadcast join physical plan to execute 
the join, broadcasting the small side and streaming through the bigger side. 
Similar concept exists in Hive and is explained 
[here|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization#LanguageManualJoinOptimization-OptimizeAutoJoinConversion].

> Simple cost estimation and auto selection of broadcast join
> -----------------------------------------------------------
>
>                 Key: SPARK-2393
>                 URL: https://issues.apache.org/jira/browse/SPARK-2393
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: Michael Armbrust
>            Assignee: Zongheng Yang
>            Priority: Critical
>
> Spark SQL should support the common optimization known as cost estimations. 
> For example, each logical operator should be able to estimate its 
> cardinality, based on the estimates from its children. 
> As a first step, the framework to support doing so should be added, which 
> might include the interface for the aforementioned cardinality estimation, 
> and/or some other metrics.
> As the first proof of concept usage of this optimization, a simple 
> optimization strategy for certain equi-joins should be added: namely, if one 
> side of a qualifying join has a small estimated physical size (smaller than 
> some threshold), the planner should use a broadcast join physical plan to 
> execute the join, broadcasting the small side and streaming through the 
> bigger side. Similar concept exists in Hive and is explained 
> [here|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization#LanguageManualJoinOptimization-OptimizeAutoJoinConversion].



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to