Saisai Shao created SPARK-2926:
----------------------------------
Summary: Add MR-style (merge-sort) SortShuffleReader for
sort-based shuffle
Key: SPARK-2926
URL: https://issues.apache.org/jira/browse/SPARK-2926
Project: Spark
Issue Type: Improvement
Components: Shuffle
Affects Versions: 1.1.0
Reporter: Saisai Shao
Currently Spark has already integrated sort-based shuffle write, which greatly
improve the IO performance and reduce the memory consumption when reducer
number is very large. But for the reducer side, it still adopts the
implementation of hash-based shuffle reader, which neglect the ordering
attributes of map output data in some situations.
Here we propose a MR style sort-merge like shuffle reader for sort-based
shuffle to better improve the performance of sort-based shuffle.
Working in progress code and performance test report will be posted later when
some unit test bugs are fixed.
Any comments would be greatly appreciated.
Thanks a lot.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]