[ 
https://issues.apache.org/jira/browse/SPARK-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14131032#comment-14131032
 ] 

Saisai Shao commented on SPARK-2926:
------------------------------------

Hi [~matei], according to your comments, I just did another round of 
performance test for current impl of sort-based shuffle and my proposal. 

For sort-by-key case, I tested with different length string as key to verify 
the key comparison overhead. Though as you said the comparison time is 
increased, still the total performance is better than current implementation. 

For aggregation-by-key case, with different aggregation factor the performance 
of these two implementations are closer.

I think we can use it in the sortByKey() at first as you said, besides some 
codes like mergeSort() and mergeWithAggregation() can be shard with this 
proposal. Would you mind taking a look at this new test report and give me some 
comments?

Thanks a lot and appreciate your time.

> Add MR-style (merge-sort) SortShuffleReader for sort-based shuffle
> ------------------------------------------------------------------
>
>                 Key: SPARK-2926
>                 URL: https://issues.apache.org/jira/browse/SPARK-2926
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle
>    Affects Versions: 1.1.0
>            Reporter: Saisai Shao
>         Attachments: SortBasedShuffleRead.pdf, Spark Shuffle Test Report.pdf
>
>
> Currently Spark has already integrated sort-based shuffle write, which 
> greatly improve the IO performance and reduce the memory consumption when 
> reducer number is very large. But for the reducer side, it still adopts the 
> implementation of hash-based shuffle reader, which neglects the ordering 
> attributes of map output data in some situations.
> Here we propose a MR style sort-merge like shuffle reader for sort-based 
> shuffle to better improve the performance of sort-based shuffle.
> Working in progress code and performance test report will be posted later 
> when some unit test bugs are fixed.
> Any comments would be greatly appreciated. 
> Thanks a lot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to