[jira] [Created] (SPARK-8137) Improve treeAggregate to combine all data on one machine first

Shivaram Venkataraman (JIRA) Fri, 05 Jun 2015 17:21:24 -0700

Shivaram Venkataraman created SPARK-8137:
--------------------------------------------


             Summary: Improve treeAggregate to combine all data on one machine 
first
                 Key: SPARK-8137
                 URL: https://issues.apache.org/jira/browse/SPARK-8137
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 1.4.0
            Reporter: Shivaram Venkataraman


Right now if we have multiple partitions on the same machine we shuffle the 
partitions and don't aggregate them first in treeAggregate. Once we have 
support for shuffle locality, we can get this for free by using the executorIds 
as the keys for aggregation. 
https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/util/Utils.scala#L96
 has an example implementation





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-8137) Improve treeAggregate to combine all data on one machine first

Reply via email to