[ https://issues.apache.org/jira/browse/SPARK-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14319056#comment-14319056 ]
Apache Spark commented on SPARK-2774: ------------------------------------- User 'shivaram' has created a pull request for this issue: https://github.com/apache/spark/pull/4576 > Set preferred locations for reduce tasks > ---------------------------------------- > > Key: SPARK-2774 > URL: https://issues.apache.org/jira/browse/SPARK-2774 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Reporter: Shivaram Venkataraman > Assignee: Shivaram Venkataraman > > Currently we do not set preferred locations for reduce tasks in Spark. This > patch proposes setting preferred locations based on the map output sizes and > locations tracked by the MapOutputTracker. This is useful in two conditions > 1. When you have a small job in a large cluster it can be useful to co-locate > map and reduce tasks to avoid going over the network > 2. If there is a lot of data skew in the map stage outputs, then it is > beneficial to place the reducer close to the largest output. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org