[
https://issues.apache.org/jira/browse/SPARK-14290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhang, Liye updated SPARK-14290:
--------------------------------
Description:
When netty transfer data that is not from *FileRegion*, data will be transfered
as *ByteBuf*, If the data is large, there will occur significant performance
issue because there is memory copy underlying in *sun.nio.ch.IOUtil.write*, the
CPU is 100% used, and network is very low. We can check it by comparing *NIO*
and *Netty* for *spark.shuffle.blockTransferService* in spark 1.4. NIO network
bandwidth is much better than Netty.
How to reproduce:
{code}
sc.parallelize(Array(1,2,3),3).mapPartitions(a=>Array(new Array[Double](1024 *
1024 * 50)).iterator).reduce((a,b)=> a).length
{code}
The root cause can referred
[here|http://stackoverflow.com/questions/34493320/how-does-buffer-size-affect-nio-channel-performance].
was:
When netty transfer data that is not from `FileRegion`, data will be transfered
as `ByteBuf`, If the data is large, there will occur significant performance
issue because there is memory copy underlying in `sun.nio.ch.IOUtil.write`, the
CPU is 100% used, and network is very low. We can check it by comparing `NIO`
and `Netty` for`spark.shuffle.blockTransferService` in spark 1.4. NIO network
bandwidth is much better than Netty.
How to reproduce:
{code}
sc.parallelize(Array(1,2,3),3).mapPartitions(a=>Array(new Array[Double](1024 *
1024 * 50)).iterator).reduce((a,b)=> a).length
{code}
The root cause can referred
[here|http://stackoverflow.com/questions/34493320/how-does-buffer-size-affect-nio-channel-performance].
> Fully utilize the network bandwidth for Netty RPC by avoid significant
> underlying memory copy
> ---------------------------------------------------------------------------------------------
>
> Key: SPARK-14290
> URL: https://issues.apache.org/jira/browse/SPARK-14290
> Project: Spark
> Issue Type: Improvement
> Components: Input/Output, Spark Core
> Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.5.0, 1.6.0, 2.0.0
> Reporter: Zhang, Liye
>
> When netty transfer data that is not from *FileRegion*, data will be
> transfered as *ByteBuf*, If the data is large, there will occur significant
> performance issue because there is memory copy underlying in
> *sun.nio.ch.IOUtil.write*, the CPU is 100% used, and network is very low. We
> can check it by comparing *NIO* and *Netty* for
> *spark.shuffle.blockTransferService* in spark 1.4. NIO network bandwidth is
> much better than Netty.
> How to reproduce:
> {code}
> sc.parallelize(Array(1,2,3),3).mapPartitions(a=>Array(new Array[Double](1024
> * 1024 * 50)).iterator).reduce((a,b)=> a).length
> {code}
> The root cause can referred
> [here|http://stackoverflow.com/questions/34493320/how-does-buffer-size-affect-nio-channel-performance].
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]