[
https://issues.apache.org/jira/browse/SPARK-20006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15931050#comment-15931050
]
Takeshi Yamamuro commented on SPARK-20006:
------------------------------------------
I feel more options we have for controlling plan strategies, more difficult
users use DataFrame/Dataset. Essentially, CBO should control these kinds of
things, I think.
> Separate threshold for broadcast and shuffled hash join
> -------------------------------------------------------
>
> Key: SPARK-20006
> URL: https://issues.apache.org/jira/browse/SPARK-20006
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.1.0
> Reporter: Zhan Zhang
> Priority: Minor
>
> Currently both canBroadcast and canBuildLocalHashMap use the same
> configuration: AUTO_BROADCASTJOIN_THRESHOLD.
> But the memory model may be different. For broadcast, currently the hash map
> is always build on heap. For shuffledHashJoin, the hash map may be build on
> heap(longHash), or off heap(other map if off heap is enabled). The same
> configuration makes the configuration hard to tune (how to allocate memory
> onheap/offheap). Propose to use different configuration. Please comments
> whether it is reasonable.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]