[jira] [Commented] (SPARK-20006) Separate threshold for broadcast and shuffled hash join

Zhan Zhang (JIRA) Fri, 17 Mar 2017 21:42:57 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-20006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15931054#comment-15931054
 ]


Zhan Zhang commented on SPARK-20006:
------------------------------------

The default ShuffledHashJoin threshold can fallback to the broadcast one. A 
separate configuration does provide us opportunities to optimize the join 
dramatically. It would be great if CBO can automatically find the best 
strategy. But probably I miss something. Currently the CBO does not collect 
right statistics, especially for partitioned table. 
https://issues.apache.org/jira/browse/SPARK-19890

> Separate threshold for broadcast and shuffled hash join
> -------------------------------------------------------
>
>                 Key: SPARK-20006
>                 URL: https://issues.apache.org/jira/browse/SPARK-20006
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.1.0
>            Reporter: Zhan Zhang
>            Priority: Minor
>
> Currently both canBroadcast and canBuildLocalHashMap use the same 
> configuration: AUTO_BROADCASTJOIN_THRESHOLD. 
> But the memory model may be different. For broadcast, currently the hash map 
> is always build on heap. For shuffledHashJoin, the hash map may be build on 
> heap(longHash), or off heap(other map if off heap is enabled). The same 
> configuration makes the configuration hard to tune (how to allocate memory 
> onheap/offheap). Propose to use different configuration. Please comments 
> whether it is reasonable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-20006) Separate threshold for broadcast and shuffled hash join

Reply via email to