[ 
https://issues.apache.org/jira/browse/SPARK-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14379532#comment-14379532
 ] 

Saisai Shao commented on SPARK-2926:
------------------------------------

Hi [~DoingDone9], would you please give some detailed information about your 
test environment, like cluster size, hardware configurations, as well as Spark 
configurations. Also would you please offer each stage's running time as 
compared to total running time. Thanks a lot.

As for my local environment with small 1 master + 4 slaves, I tested with my 
patch rebased to the latest master, as compared to the master branch, the 
result shows that the performance of sortByKey with my patch is still faster 
than the master branch at about (15% to 20 %).

I think there's a possibility that different hardware configurations may shift 
the hardware bottleneck and result in different results, I will investigate 
more, it would be very helpful if you could offer some more detailed 
information.

> Add MR-style (merge-sort) SortShuffleReader for sort-based shuffle
> ------------------------------------------------------------------
>
>                 Key: SPARK-2926
>                 URL: https://issues.apache.org/jira/browse/SPARK-2926
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle
>    Affects Versions: 1.1.0
>            Reporter: Saisai Shao
>            Assignee: Saisai Shao
>         Attachments: SortBasedShuffleRead.pdf, Spark Shuffle Test 
> Report(contd).pdf, Spark Shuffle Test Report.pdf
>
>
> Currently Spark has already integrated sort-based shuffle write, which 
> greatly improve the IO performance and reduce the memory consumption when 
> reducer number is very large. But for the reducer side, it still adopts the 
> implementation of hash-based shuffle reader, which neglects the ordering 
> attributes of map output data in some situations.
> Here we propose a MR style sort-merge like shuffle reader for sort-based 
> shuffle to better improve the performance of sort-based shuffle.
> Working in progress code and performance test report will be posted later 
> when some unit test bugs are fixed.
> Any comments would be greatly appreciated. 
> Thanks a lot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to