[ https://issues.apache.org/jira/browse/SPARK-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Reynold Xin updated SPARK-2774: ------------------------------- Assignee: Shivaram Venkataraman > Set preferred locations for reduce tasks > ---------------------------------------- > > Key: SPARK-2774 > URL: https://issues.apache.org/jira/browse/SPARK-2774 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Reporter: Shivaram Venkataraman > Assignee: Shivaram Venkataraman > > Currently we do not set preferred locations for reduce tasks in Spark. This > patch proposes setting preferred locations based on the map output sizes and > locations tracked by the MapOutputTracker. This is useful in two conditions > 1. When you have a small job in a large cluster it can be useful to co-locate > map and reduce tasks to avoid going over the network > 2. If there is a lot of data skew in the map stage outputs, then it is > beneficial to place the reducer close to the largest output. -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org