Matei Zaharia created SPARK-2045:
------------------------------------
Summary: Sort-based shuffle implementation
Key: SPARK-2045
URL: https://issues.apache.org/jira/browse/SPARK-2045
Project: Spark
Issue Type: New Feature
Reporter: Matei Zaharia
Building on the pluggability in SPARK-2044, a sort-based shuffle implementation
that takes advantage of an Ordering for keys (or just sorts by hashcode for
keys that don't have it) would likely improve performance and memory usage in
very large shuffles. Our current hash-based shuffle needs an open file for each
reduce task, which can fill up a lot of memory for compression buffers and
cause inefficient IO. This would avoid both of those issues.
--
This message was sent by Atlassian JIRA
(v6.2#6252)