[ 
https://issues.apache.org/jira/browse/SPARK-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Yuanjian updated SPARK-2926:
-------------------------------
    Attachment: Spark Shuffle Test Report on Spark2.x.pdf

[~jerryshao] Hi saisai, thanks for your advise, I added a test report according 
to your suggestion. As described in the report, I only compare two shuffle mode 
in 'sort-by-key' workload because other test workloads shared same code paths 
in POC implementation(SortShuffleWriter with BlockStoreShuffleReader).
Also add a config( [code 
link|https://github.com/apache/spark/pull/19745/commits/fe9394eadf8ea51af2b2cb41b5b42981fa600752]
 ) just to force shutting down SerializedShuffle in 'sort-by-key' workload, 
otherwise both of master and POC use the SerializedShuffle.
For sort-by-key work around after closing Serialized Shuffle, the POC version 
can brings 1.44x faster than current master, although map side stage 1.16x 
slower, but reducer stage has 9.4x boosting.

> Add MR-style (merge-sort) SortShuffleReader for sort-based shuffle
> ------------------------------------------------------------------
>
>                 Key: SPARK-2926
>                 URL: https://issues.apache.org/jira/browse/SPARK-2926
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle
>    Affects Versions: 1.1.0
>            Reporter: Saisai Shao
>            Assignee: Saisai Shao
>         Attachments: SortBasedShuffleRead.pdf, SortBasedShuffleReader on 
> Spark 2.x.pdf, Spark Shuffle Test Report on Spark2.x.pdf, Spark Shuffle Test 
> Report(contd).pdf, Spark Shuffle Test Report.pdf
>
>
> Currently Spark has already integrated sort-based shuffle write, which 
> greatly improve the IO performance and reduce the memory consumption when 
> reducer number is very large. But for the reducer side, it still adopts the 
> implementation of hash-based shuffle reader, which neglects the ordering 
> attributes of map output data in some situations.
> Here we propose a MR style sort-merge like shuffle reader for sort-based 
> shuffle to better improve the performance of sort-based shuffle.
> Working in progress code and performance test report will be posted later 
> when some unit test bugs are fixed.
> Any comments would be greatly appreciated. 
> Thanks a lot.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to