[jira] [Comment Edited] (SPARK-2468) Netty-based block server / client module

Aaron Davidson (JIRA) Wed, 05 Nov 2014 22:58:32 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199911#comment-14199911
 ]


Aaron Davidson edited comment on SPARK-2468 at 11/6/14 6:57 AM:
----------------------------------------------------------------

This could be due to the netty transfer service allocating more off-heap byte 
buffers, which perhaps is accounted for differently by YARN. [PR 
#3101|https://github.com/apache/spark/pull/3101/files#diff-d2ce9b38bdc38ca9d7119f9c2cf79907R33],
 which should go in tomorrow, will include a way to avoid allocating off-heap 
buffers (by setting the spark config 
"spark.shuffle.io.preferDirectBufs=false"), which should either solve your 
problem or at least produce the more typical OutOfMemoryError.


was (Author: ilikerps):
This could be due to the netty transfer service allocating more off-heap byte 
buffers, which perhaps is accounted for differently by YARN. [PR 
#3101|https://github.com/apache/spark/pull/3101/files#diff-d2ce9b38bdc38ca9d7119f9c2cf79907R33],
 which should go in tomorrow, will include a way to avoid allocating off-heap 
buffers, which should either solve your problem or at least produce the more 
typical OutOfMemoryError.

> Netty-based block server / client module
> ----------------------------------------
>
>                 Key: SPARK-2468
>                 URL: https://issues.apache.org/jira/browse/SPARK-2468
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle, Spark Core
>            Reporter: Reynold Xin
>            Assignee: Reynold Xin
>            Priority: Critical
>             Fix For: 1.2.0
>
>
> Right now shuffle send goes through the block manager. This is inefficient 
> because it requires loading a block from disk into a kernel buffer, then into 
> a user space buffer, and then back to a kernel send buffer before it reaches 
> the NIC. It does multiple copies of the data and context switching between 
> kernel/user. It also creates unnecessary buffer in the JVM that increases GC
> Instead, we should use FileChannel.transferTo, which handles this in the 
> kernel space with zero-copy. See 
> http://www.ibm.com/developerworks/library/j-zerocopy/
> One potential solution is to use Netty.  Spark already has a Netty based 
> network module implemented (org.apache.spark.network.netty). However, it 
> lacks some functionality and is turned off by default. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SPARK-2468) Netty-based block server / client module

Reply via email to