[ 
https://issues.apache.org/jira/browse/SPARK-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14210074#comment-14210074
 ] 

Aaron Davidson edited comment on SPARK-2468 at 11/13/14 5:48 PM:
-----------------------------------------------------------------

Here is my spark configuration for the test, 32 cores total (note that this is 
test-only configuration to maximize throughput, I would not recommend these 
settings for real workloads):

spark.shuffle.io.clientThreads =16,
spark.shuffle.io.serverThreads =16,
spark.serializer = "org.apache.spark.serializer.KryoSerializer",
spark.shuffle.blockTransferService = "netty",
spark.shuffle.compress = false,
spark.shuffle.io.maxRetries = 0,
spark.reducer.maxMbInFlight = 512

Forgot to mention, but #3155 now automatically sets 
spark.shuffle.io.clientThreads and spark.shuffle.io.serverThreads based on the 
number of cores the Executor has allotted to it. You can  override it by 
setting those properties by hand, but ideally the default behavior is 
sufficient.


was (Author: ilikerps):
Here is my spark configuration (note 32 cores total):
spark.shuffle.io.clientThreads =16,
spark.shuffle.io.serverThreads =16,
spark.serializer = "org.apache.spark.serializer.KryoSerializer",
spark.shuffle.blockTransferService = "netty",
spark.shuffle.compress = false,
spark.shuffle.io.maxRetries = 0,
spark.reducer.maxMbInFlight = 512

Forgot to mention, but #3155 now automatically sets 
spark.shuffle.io.clientThreads and spark.shuffle.io.serverThreads based on the 
number of cores the Executor has allotted to it. You can  override it by 
setting those properties by hand, but ideally the default behavior is 
sufficient.

> Netty-based block server / client module
> ----------------------------------------
>
>                 Key: SPARK-2468
>                 URL: https://issues.apache.org/jira/browse/SPARK-2468
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle, Spark Core
>            Reporter: Reynold Xin
>            Assignee: Reynold Xin
>            Priority: Critical
>             Fix For: 1.2.0
>
>
> Right now shuffle send goes through the block manager. This is inefficient 
> because it requires loading a block from disk into a kernel buffer, then into 
> a user space buffer, and then back to a kernel send buffer before it reaches 
> the NIC. It does multiple copies of the data and context switching between 
> kernel/user. It also creates unnecessary buffer in the JVM that increases GC
> Instead, we should use FileChannel.transferTo, which handles this in the 
> kernel space with zero-copy. See 
> http://www.ibm.com/developerworks/library/j-zerocopy/
> One potential solution is to use Netty.  Spark already has a Netty based 
> network module implemented (org.apache.spark.network.netty). However, it 
> lacks some functionality and is turned off by default. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to