Shivaram Venkataraman created SPARK-2774:
--------------------------------------------

             Summary: Set preferred locations for reduce tasks
                 Key: SPARK-2774
                 URL: https://issues.apache.org/jira/browse/SPARK-2774
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
            Reporter: Shivaram Venkataraman


Currently we do not set preferred locations for reduce tasks in Spark. This 
patch proposes setting preferred locations based on the map output sizes and 
locations tracked by the MapOutputTracker. This is useful in two conditions

1. When you have a small job in a large cluster it can be useful to co-locate 
map and reduce tasks to avoid going over the network
2. If there is a lot of data skew in the map stage outputs, then it is 
beneficial to place the reducer close to the largest output.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to