tion starting in 1.2.
-- --
??: "Lalitha MV";<lalitham...@gmail.com>;
: 2016??7??5??(??) 2:44
??: "Sun Rui"<sunrise_...@163.com>;
: "Takeshi Yamamuro"<linguin@gmail.com>;
"user@spark.a
By setting the preferSortMergeJoin to false, it still only picks between
Merge Join and Broadcast join. Does not pick shuffle hash join depending on
autobroadcastthreshold's value.
I went though the sparkstrategies, and doesn't look like there is a direct
clean way to enforce it.
On Mon, Jul 4,
You can try set “spark.sql.join.preferSortMergeJoin” cons option to false.
For detailed join strategies, take a look at the source code of
SparkStrategies.scala:
/**
* Select the proper physical plan for join based on joining keys and size of
logical plan.
*
* At first, uses the
What's the query?
On Tue, Jul 5, 2016 at 2:28 PM, Lalitha MV wrote:
> It picks sort merge join, when spark.sql.autoBroadcastJoinThreshold is
> set to -1, or when the size of the small table is more than spark.sql.
> spark.sql.autoBroadcastJoinThreshold.
>
> On Mon, Jul 4,
It picks sort merge join, when spark.sql.autoBroadcastJoinThreshold is set
to -1, or when the size of the small table is more than spark.sql.spark.sql.
autoBroadcastJoinThreshold.
On Mon, Jul 4, 2016 at 10:17 PM, Takeshi Yamamuro
wrote:
> The join selection can be
The join selection can be described in
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala#L92
.
If you have join keys, you can set -1 at
`spark.sql.autoBroadcastJoinThreshold` to disable broadcast joins. Then,
hash joins are
Hi maropu,
Thanks for your reply.
Would it be possible to write a rule for this, to make it always pick
shuffle hash join, over other join implementations(i.e. sort merge and
broadcast)?
Is there any documentation demonstrating rule based transformation for
physical plan trees?
Thanks,
Lalitha
Hi,
No, spark has no hint for the hash join.
// maropu
On Fri, Jul 1, 2016 at 4:56 PM, Lalitha MV wrote:
> Hi,
>
> In order to force broadcast hash join, we can set
> the spark.sql.autoBroadcastJoinThreshold config. Is there a way to enforce
> shuffle hash join in spark
Hi,
In order to force broadcast hash join, we can set
the spark.sql.autoBroadcastJoinThreshold config. Is there a way to enforce
shuffle hash join in spark sql?
Thanks,
Lalitha