[
https://issues.apache.org/jira/browse/SPARK-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Armbrust resolved SPARK-2393.
-------------------------------------
Resolution: Fixed
Fix Version/s: 1.1.0
> Simple cost estimation and auto selection of broadcast join
> -----------------------------------------------------------
>
> Key: SPARK-2393
> URL: https://issues.apache.org/jira/browse/SPARK-2393
> Project: Spark
> Issue Type: New Feature
> Components: SQL
> Reporter: Michael Armbrust
> Assignee: Zongheng Yang
> Priority: Critical
> Fix For: 1.1.0
>
>
> Spark SQL should support the common optimization known as cost estimations.
> For example, each logical operator should be able to estimate its
> cardinality, based on the estimates from its children.
> As a first step, the framework to support doing so should be added, which
> might include the interface for the aforementioned cardinality estimation,
> and/or some other metrics.
> As the first proof of concept usage of this optimization, a simple
> optimization strategy for certain equi-joins should be added: namely, if one
> side of a qualifying join has a small estimated physical size (smaller than
> some threshold), the planner should use a broadcast join physical plan to
> execute the join, broadcasting the small side and streaming through the
> bigger side. Similar concept exists in Hive and is explained
> [here|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization#LanguageManualJoinOptimization-OptimizeAutoJoinConversion].
--
This message was sent by Atlassian JIRA
(v6.2#6252)