[
https://issues.apache.org/jira/browse/SPARK-14293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15219875#comment-15219875
]
Apache Spark commented on SPARK-14293:
--------------------------------------
User 'peterpc0701' has created a pull request for this issue:
https://github.com/apache/spark/pull/12085
> Improve shuffle load balancing and minmize network data transmission
> --------------------------------------------------------------------
>
> Key: SPARK-14293
> URL: https://issues.apache.org/jira/browse/SPARK-14293
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 1.6.1
> Reporter: Cheng Pei
> Labels: performance
>
> Currently Spark provides the mechanism to set preferred location for reduce
> task. When the fraction of total map output at a location is equal or greater
> than the parameter REDUCER_PREF_LOCS_FRACTION, the reduce task will get a
> preferred location. But this does not consider load balancing and network
> transmission. Based on the map output sizes and locations tracked by
> MapOutputTracker, we can obtain a better load balancing.
>
> This patch proposes a strategy to set preferred locations for each reduce
> task, which could firstly keep each executor process almost the same amount
> of intermediate data and secondly minimize the network data transmission.
> This can benefit some conditions:
> 1. REDUCER_PREF_LOCS_FRACTION tries to place the reduce tasks close to the
> largest output. If there exists data skew in the map outputs, it could cause
> some executors that have large of map outputs become busy. Our method could
> avoid this case and minimize the network data transmission.
> 2. When there are large of reduce tasks in the job, it helps each executor
> processes almost the same data and keeps load balancing.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]