?????? Enforcing shuffle hash join

2016-07-05 Thread ??????
tion starting in 1.2. -- -- ??: "Lalitha MV";<lalitham...@gmail.com>; : 2016??7??5??(??) 2:44 ??: "Sun Rui"<sunrise_...@163.com>; : "Takeshi Yamamuro"<linguin@gmail.com>; "user@spark.a

Re: Enforcing shuffle hash join

2016-07-05 Thread Lalitha MV
By setting the preferSortMergeJoin to false, it still only picks between Merge Join and Broadcast join. Does not pick shuffle hash join depending on autobroadcastthreshold's value. I went though the sparkstrategies, and doesn't look like there is a direct clean way to enforce it. On Mon, Jul 4,

Re: Enforcing shuffle hash join

2016-07-04 Thread Sun Rui
You can try set “spark.sql.join.preferSortMergeJoin” cons option to false. For detailed join strategies, take a look at the source code of SparkStrategies.scala: /** * Select the proper physical plan for join based on joining keys and size of logical plan. * * At first, uses the

Re: Enforcing shuffle hash join

2016-07-04 Thread Takeshi Yamamuro
What's the query? On Tue, Jul 5, 2016 at 2:28 PM, Lalitha MV wrote: > It picks sort merge join, when spark.sql.autoBroadcastJoinThreshold is > set to -1, or when the size of the small table is more than spark.sql. > spark.sql.autoBroadcastJoinThreshold. > > On Mon, Jul 4,

Re: Enforcing shuffle hash join

2016-07-04 Thread Lalitha MV
It picks sort merge join, when spark.sql.autoBroadcastJoinThreshold is set to -1, or when the size of the small table is more than spark.sql.spark.sql. autoBroadcastJoinThreshold. On Mon, Jul 4, 2016 at 10:17 PM, Takeshi Yamamuro wrote: > The join selection can be

Re: Enforcing shuffle hash join

2016-07-04 Thread Takeshi Yamamuro
The join selection can be described in https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala#L92 . If you have join keys, you can set -1 at `spark.sql.autoBroadcastJoinThreshold` to disable broadcast joins. Then, hash joins are

Re: Enforcing shuffle hash join

2016-07-04 Thread Lalitha MV
Hi maropu, Thanks for your reply. Would it be possible to write a rule for this, to make it always pick shuffle hash join, over other join implementations(i.e. sort merge and broadcast)? Is there any documentation demonstrating rule based transformation for physical plan trees? Thanks, Lalitha

Re: Enforcing shuffle hash join

2016-07-02 Thread Takeshi Yamamuro
Hi, No, spark has no hint for the hash join. // maropu On Fri, Jul 1, 2016 at 4:56 PM, Lalitha MV wrote: > Hi, > > In order to force broadcast hash join, we can set > the spark.sql.autoBroadcastJoinThreshold config. Is there a way to enforce > shuffle hash join in spark

Enforcing shuffle hash join

2016-07-01 Thread Lalitha MV
Hi, In order to force broadcast hash join, we can set the spark.sql.autoBroadcastJoinThreshold config. Is there a way to enforce shuffle hash join in spark sql? Thanks, Lalitha