Shivaram Venkataraman created SPARK-8137:
--------------------------------------------
Summary: Improve treeAggregate to combine all data on one machine
first
Key: SPARK-8137
URL: https://issues.apache.org/jira/browse/SPARK-8137
Project: Spark
Issue Type: Improvement
Components: Spark Core
Affects Versions: 1.4.0
Reporter: Shivaram Venkataraman
Right now if we have multiple partitions on the same machine we shuffle the
partitions and don't aggregate them first in treeAggregate. Once we have
support for shuffle locality, we can get this for free by using the executorIds
as the keys for aggregation.
https://github.com/amplab/ml-matrix/blob/master/src/main/scala/edu/berkeley/cs/amplab/mlmatrix/util/Utils.scala#L96
has an example implementation
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]