[ 
https://issues.apache.org/jira/browse/SPARK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14234876#comment-14234876
 ] 

Saisai Shao commented on SPARK-4740:
------------------------------------

We also tested with small dataset like 40GB, the netty performance is similar 
to NIO, I'm guessing if Netty is not efficient when fetching large number of 
shuffle blocks, in our 400GB case, each reduce task need to fetch about 7000 
shuffle blocks, and each shuffle block is about tens of KB size. 

We will try increase shuffle thread number to test again. Seeing from the call 
stack, all the shuffle client are busy waiting on epoll_wait, I'm not sure is 
this the right thing?

> Netty's network bandwidth is much lower than NIO in spark-perf and Netty 
> takes longer running time
> --------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-4740
>                 URL: https://issues.apache.org/jira/browse/SPARK-4740
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle, Spark Core
>    Affects Versions: 1.2.0
>            Reporter: Zhang, Liye
>         Attachments: Spark-perf Test Report.pdf
>
>
> When testing current spark master (1.3.0-snapshot) with spark-perf 
> (sort-by-key, aggregate-by-key, etc), Netty based shuffle transferService 
> takes much longer time than NIO based shuffle transferService. The network 
> throughput of Netty is only about half of that of NIO. 
> We tested with standalone mode, and the data set we used for test is 20 
> billion records, and the total size is about 400GB. Spark-perf test is 
> Running on a 4 node cluster with 10G NIC, 48 cpu cores per node and each 
> executor memory is 64GB. The reduce tasks number is set to 1000. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to