[
https://issues.apache.org/jira/browse/SPARK-6026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Josh Rosen updated SPARK-6026:
------------------------------
Component/s: Shuffle
> Eliminate the bypassMergeThreshold parameter and associated hash-ish shuffle
> within the Sort shuffle code
> ---------------------------------------------------------------------------------------------------------
>
> Key: SPARK-6026
> URL: https://issues.apache.org/jira/browse/SPARK-6026
> Project: Spark
> Issue Type: Bug
> Components: Shuffle, Spark Core
> Affects Versions: 1.3.0
> Reporter: Kay Ousterhout
>
> The bypassMergeThreshold parameter (and associated use of a hash-ish shuffle
> when the number of partitions is less than this) is basically a workaround
> for SparkSQL, because the fact that the sort-based shuffle stores
> non-serialized objects is a deal-breaker for SparkSQL, which re-uses objects.
> Once the sort-based shuffle is changed to store serialized objects, we
> should never be secretly doing hash-ish shuffle even when the user has
> specified to use sort-based shuffle (because of its otherwise worse
> performance).
> [~rxin][~adav], masters of shuffle, it would be helpful to get agreement from
> you on this proposal (and also a sanity check that I've correctly
> characterized the issue).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]